首页 > 最新文献

Current Bioinformatics最新文献

英文 中文
DMR_Kmeans: Identifying Differentially Methylated Regions Based on kmeans Clustering and Read Methylation Haplotype Filtering DMR_Kmeans:基于kmeans聚类和Read甲基化单倍型过滤识别差异甲基化区域
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-06 DOI: 10.2174/0115748936245495230925112419
Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li
Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since th
差异甲基化区(Differentially methylated regions, DMRs)包括组织特异性DMRs和疾病特异性DMRs,可用于揭示基因调控机制和疾病筛查。到目前为止,已经提出了许多从亚硫酸盐测序数据中检测DMRs的方法。在这些方法中,差异甲基化的CpG位点和DMRs通常是基于统计检验或分布模型来识别的,这些模型忽略了每个读取中提供的联合甲基化状态,导致DMRs的边界不准确。背景:差异甲基化区(Differentially methylated regions, DMRs)包括组织特异性DMRs和疾病特异性DMRs,可用于揭示基因调控机制和疾病筛查。到目前为止,已经提出了许多从亚硫酸盐测序数据中检测DMRs的方法。在这些方法中,差异甲基化的CpG位点和DMRs通常是基于统计检验或分布模型来识别的,这忽略了每个读取中提供的联合甲基化状态,导致DMRs的边界不准确。方法:本文提出了一种基于kmeans聚类和read甲基化单倍型过滤的DMR_Kmeans检测DMRs的方法。在DMR_Kmeans中,对于每个CpG位点,使用k-means算法对两组CpG的甲基化水平进行聚类,并根据聚类中的不同分布测量CpG的甲基化差异。利用reads的甲基化单倍型提取候选区域的甲基化模式。最后,根据候选区域的甲基化差异和甲基化模式来识别DMRs。目的:利用每个读数提供的关节甲基化状态,预测dmr的准确边界。结果:比较DMR_Kmeans和8种DMR检测方法对6对组织亚硫酸氢盐全基因组测序数据的表现,结果表明,在甲基化差异大于0.4的一定阈值下,DMR_Kmeans比其他方法获得更高的Qn和Ql,并且有更多的重叠启动子,这表明边界准确的DMR_Kmeans预测的DMRs比其他方法含有更少的CpGs,甲基化差异较小。本文提出了一种基于k-means聚类和读取甲基化单倍型过滤的DMR_Kmeans检测DMRs的方法。在DMR_Kmeans中,对于每个CpG位点,使用k-means算法对两组CpG的甲基化水平进行聚类,并根据聚类中的不同分布测量CpG的甲基化差异。利用reads的甲基化单倍型提取候选区域的甲基化模式。最后,根据候选区域的甲基化差异和甲基化模式来识别DMRs。结论:与其他方法相比,DMR_Kmeans预测的DMR总长度更长,CpG位点总数更多,可以为下游分析提供高质量的DMR集。结果:比较DMR_Kmeans和8种DMR检测方法对6对组织亚硫酸氢盐全基因组测序数据的性能,结果表明,在甲基化差异大于0.4的一定阈值下,DMR_Kmeans预测的DMR比其他方法获得更高的Qn和Ql,这表明边界准确的DMR_Kmeans预测的DMR含有较少的CpGs,甲基化差异较小。此外,由于DMR_Kmeans预测的DMR总长度更长,并且DMR中CpG位点的总数大于其他方法,因此DMR_Kmeans可以为下游分析提供高质量的DMR集。其他:无
{"title":"DMR_Kmeans: Identifying Differentially Methylated Regions Based on kmeans Clustering and Read Methylation Haplotype Filtering","authors":"Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li","doi":"10.2174/0115748936245495230925112419","DOIUrl":"https://doi.org/10.2174/0115748936245495230925112419","url":null,"abstract":"Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since th","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134945067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deep Neural Network Model with Attribute Network Representation for lncRNA-Protein Interaction Prediction 基于属性网络的lncrna -蛋白相互作用预测的深度神经网络模型
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-06 DOI: 10.2174/0115748936267109230919104630
Meng-Meng Wei, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Lei Wang
background: LncRNA is not only involved in the regulation of the biological functions of protein-coding genes but its dysfunction is also associated with the occurrence and progression of various diseases. As more and more studies have shown that an in-depth understanding of the mechanism of action of lncRNA is of great significance for disease treatment. However, traditional wet testing is time-consuming, laborious, expensive, and has many subjective factors, which may affect the accuracy of the experiment. objective: Most of the methods for predicting lncRNA-protein interaction (LPI) only rely on a single feature or there is noise in the feature. To solve this problem, we propose a computational model CSALPI based on a deep neural network. method: Firstly, this model utilizes cosine similarity to extract similarity features for lncRNA-lncRNA and protein-protein. Denoising similar features using the Sparse Autoencoder. Second, a neighbor enhancement autoencoder is employed to enforce neighboring nodes to be represented in a similar way by reconstructing the denoised features. Finally, a Light Gradient Boosting Machine classifier is used to predict potential LPIs. result: To demonstrate the reliability of CSALPI, multiple evaluation metrics were used under a 5-fold cross-validation experiment and excellent results were achieved. In the case study, the model successfully predicted 7 out of 10 disease-associated lncRNA and protein pairs. conclusion: The CSALPI can be used as an effective complementary method for predicting potential LPIs from biological experiments.
背景:LncRNA不仅参与调节蛋白质编码基因的生物学功能,其功能障碍还与各种疾病的发生和进展有关。越来越多的研究表明,深入了解lncRNA的作用机制对疾病的治疗具有重要意义。但传统的湿法检测耗时长、费力、费用高,且主观因素较多,可能影响实验的准确性。目的:大多数预测lncRNA-protein interaction (LPI)的方法仅依赖于单个特征或特征中存在噪声。为了解决这一问题,我们提出了一种基于深度神经网络的计算模型CSALPI。首先,该模型利用余弦相似度提取lncRNA-lncRNA和protein-protein的相似特征。使用稀疏自编码器去噪相似的特征。其次,采用邻居增强自编码器,通过重构去噪特征来强制邻居节点以相似的方式表示。最后,使用光梯度增强机分类器来预测潜在的lpi。结果:为了证明CSALPI的可靠性,在5倍交叉验证实验下,采用了多个评价指标,取得了良好的结果。在案例研究中,该模型成功预测了10对疾病相关lncRNA和蛋白对中的7对。结论:CSALPI可作为生物实验中预测潜在lpi的有效补充方法。
{"title":"A Deep Neural Network Model with Attribute Network Representation for lncRNA-Protein Interaction Prediction","authors":"Meng-Meng Wei, Chang-Qing Yu, Li-Ping Li, Zhu-Hong You, Lei Wang","doi":"10.2174/0115748936267109230919104630","DOIUrl":"https://doi.org/10.2174/0115748936267109230919104630","url":null,"abstract":"background: LncRNA is not only involved in the regulation of the biological functions of protein-coding genes but its dysfunction is also associated with the occurrence and progression of various diseases. As more and more studies have shown that an in-depth understanding of the mechanism of action of lncRNA is of great significance for disease treatment. However, traditional wet testing is time-consuming, laborious, expensive, and has many subjective factors, which may affect the accuracy of the experiment. objective: Most of the methods for predicting lncRNA-protein interaction (LPI) only rely on a single feature or there is noise in the feature. To solve this problem, we propose a computational model CSALPI based on a deep neural network. method: Firstly, this model utilizes cosine similarity to extract similarity features for lncRNA-lncRNA and protein-protein. Denoising similar features using the Sparse Autoencoder. Second, a neighbor enhancement autoencoder is employed to enforce neighboring nodes to be represented in a similar way by reconstructing the denoised features. Finally, a Light Gradient Boosting Machine classifier is used to predict potential LPIs. result: To demonstrate the reliability of CSALPI, multiple evaluation metrics were used under a 5-fold cross-validation experiment and excellent results were achieved. In the case study, the model successfully predicted 7 out of 10 disease-associated lncRNA and protein pairs. conclusion: The CSALPI can be used as an effective complementary method for predicting potential LPIs from biological experiments.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134944940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSSD: An Efficient Method for Constructing Accurate and Stable Phylogenetic Networks by Merging Subtrees of Equal Depth MSSD:一种利用等深度子树合并构建准确稳定的系统发育网络的有效方法
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-04 DOI: 10.2174/0115748936256923230927081102
Jiajie Xing, Xu Song, Meiju Yu, Juan Wang, Jing Yu
Background:: Systematic phylogenetic networks are essential for studying the evolutionary relationships and diversity among species. These networks are particularly important for capturing non-tree-like processes resulting from reticulate evolutionary events. However, existing methods for constructing phylogenetic networks are influenced by the order of inputs. The different orders can lead to inconsistent experimental results. Moreover, constructing a network for large datasets is time-consuming and the network often does not include all of the input tree nodes. Aims: This paper aims to propose a novel method, called as MSSD, which can construct a phylogenetic network from gene trees by Merging Subtrees with the Same Depth in a bottom-up way. background: Phylogenetic trees can represent the evolutionary history of genes vertically. There is a difference between phylogenetic trees of different genes due to the reticulate evolution events of species. Phylogenetic networks can represent reticulate evolutionary processes and show the difference between rooted gene trees. Methods:: The MSSD first decomposes trees into subtrees based on depth. Then it merges subtrees with the same depth from 0 to the maximum depth. For all subtrees of one depth, it inserts each subtree into the current networks by means of identical subtrees. Results:: We test the MSSD on the simulated data and real data. The experimental results show that the networks constructed by the MSSD can represent all input trees and the MSSD is more stable than other methods. The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods. Conclusion:: MSSD is a powerful tool for studying the evolutionary relationships among species in biologyand is free available at https://github.com/xingjiajie2023/MSSD. conclusion: The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods.
背景:系统的系统发育网络是研究物种间进化关系和多样性的必要条件。这些网络对于捕捉网状进化事件产生的非树状过程尤为重要。然而,现有的构建系统发育网络的方法受到输入顺序的影响。不同的顺序会导致不一致的实验结果。此外,为大型数据集构建网络非常耗时,而且网络通常不包括所有的输入树节点。目的:本文旨在提出一种新的方法MSSD,该方法通过自底向上的方式合并具有相同深度的子树,从基因树中构建系统发育网络。背景:系统发育树可以垂直地表示基因的进化史。由于物种的网状进化事件,不同基因的系统发育树存在差异。系统发育网络可以表示网状的进化过程,并显示出根植基因树之间的差异。方法:MSSD首先根据深度将树分解成子树。然后它合并具有相同深度的子树从0到最大深度。对于同一深度的所有子树,它通过相同的子树将每个子树插入到当前网络中。结果:对模拟数据和实际数据进行了测试。实验结果表明,用MSSD构建的网络可以表示所有的输入树,并且比其他方法更稳定。与其他方法相比,MSSD可以更快地构建网络,并且构建的网络与输入树具有更多相似的信息。结论:MSSD是研究生物物种间进化关系的有力工具,可在https://github.com/xingjiajie2023/MSSD免费获取。结论:与其他方法相比,MSSD可以更快地构建网络,并且构建的网络与输入树的信息相似度更高。
{"title":"MSSD: An Efficient Method for Constructing Accurate and Stable Phylogenetic Networks by Merging Subtrees of Equal Depth","authors":"Jiajie Xing, Xu Song, Meiju Yu, Juan Wang, Jing Yu","doi":"10.2174/0115748936256923230927081102","DOIUrl":"https://doi.org/10.2174/0115748936256923230927081102","url":null,"abstract":"Background:: Systematic phylogenetic networks are essential for studying the evolutionary relationships and diversity among species. These networks are particularly important for capturing non-tree-like processes resulting from reticulate evolutionary events. However, existing methods for constructing phylogenetic networks are influenced by the order of inputs. The different orders can lead to inconsistent experimental results. Moreover, constructing a network for large datasets is time-consuming and the network often does not include all of the input tree nodes. Aims: This paper aims to propose a novel method, called as MSSD, which can construct a phylogenetic network from gene trees by Merging Subtrees with the Same Depth in a bottom-up way. background: Phylogenetic trees can represent the evolutionary history of genes vertically. There is a difference between phylogenetic trees of different genes due to the reticulate evolution events of species. Phylogenetic networks can represent reticulate evolutionary processes and show the difference between rooted gene trees. Methods:: The MSSD first decomposes trees into subtrees based on depth. Then it merges subtrees with the same depth from 0 to the maximum depth. For all subtrees of one depth, it inserts each subtree into the current networks by means of identical subtrees. Results:: We test the MSSD on the simulated data and real data. The experimental results show that the networks constructed by the MSSD can represent all input trees and the MSSD is more stable than other methods. The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods. Conclusion:: MSSD is a powerful tool for studying the evolutionary relationships among species in biologyand is free available at https://github.com/xingjiajie2023/MSSD. conclusion: The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135647357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iPSI(2L)-EDL: a Two-layer Predictor for Identifying Promoters and their Types based on Ensemble Deep Learning iPSI(2L)-EDL:基于集成深度学习的启动子及其类型识别的双层预测器
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-02 DOI: 10.2174/0115748936264316230926073231
Xuan Xiao, Zaihao Hu, ZhenTao Luo, Zhaochun Xu
Abstract: Promoters are DNA fragments located near the transcription initiation site, they can be divided into strong promoter type and weak promoter type according to transcriptional activation and expression level. Identifying promoters and their strengths in DNA sequences is essential for understanding gene expression regulation. Therefore, it is crucial to further improve predictive quality of predictors for real-world application requirements. Here, we constructed the latest training dataset based on the RegalonDB website, where all the promoters in this dataset have been experimentally validated, and their sequence similarity is less than 85%. We used one-hot and nucleotide chemical property and density (NCPD) to represent DNA sequence samples. Additionally, we proposed an ensemble deep learning framework containing a multi-head attention module, long short-term memory present, and a convolutional neural network module. The results showed that iPSI(2L)-EDL outperformed other existing methods for both promoter prediction and identification of strong promoter type and weak promoter type, the AUC and MCC for the iPSI(2L)-EDL in identifying promoter were improved by 2.23% and 2.96% compared to that of PseDNC-DL on independent testing data, respectively, while the AUC and MCC for the iPSI(2L)- EDL were increased by 3.74% and 5.86% in predicting promoter strength type, respectively. The results of ablation experiments indicate that CNN plays a crucial role in recognizing promoters, the importance of different input positions and long-range dependency relationships among features are helpful for recognizing promoters. Furthermore, to make it easier for most experimental scientists to get the results they need, a userfriendly web server has been established and can be accessed at http://47.94.248.117/IPSW(2L)-EDL.
启动子是位于转录起始位点附近的DNA片段,根据转录激活和表达水平可分为强启动子型和弱启动子型。确定启动子及其在DNA序列中的优势对理解基因表达调控至关重要。因此,进一步提高预测器对实际应用需求的预测质量是至关重要的。在这里,我们基于RegalonDB网站构建了最新的训练数据集,该数据集中的所有启动子都经过了实验验证,它们的序列相似度小于85%。我们使用单热和核苷酸化学性质和密度(NCPD)来表示DNA序列样本。此外,我们提出了一个集成深度学习框架,其中包含多头注意模块、长短期记忆呈现和卷积神经网络模块。结果表明,iPSI(2L)-EDL在启动子预测及强启动子类型和弱启动子类型鉴定方面均优于其他现有方法,iPSI(2L)-EDL在独立检测数据上对启动子的AUC和MCC分别比pseddc - dl提高了2.23%和2.96%,iPSI(2L)-EDL在启动子强度类型预测方面的AUC和MCC分别提高了3.74%和5.86%。消融实验结果表明,CNN在启动子识别中起着至关重要的作用,不同输入位置的重要性和特征之间的长期依赖关系有助于启动子识别。此外,为了使大多数实验科学家更容易获得他们需要的结果,已经建立了一个用户友好的web服务器,可以访问http://47.94.248.117/IPSW(2L)-EDL。
{"title":"iPSI(2L)-EDL: a Two-layer Predictor for Identifying Promoters and their Types based on Ensemble Deep Learning","authors":"Xuan Xiao, Zaihao Hu, ZhenTao Luo, Zhaochun Xu","doi":"10.2174/0115748936264316230926073231","DOIUrl":"https://doi.org/10.2174/0115748936264316230926073231","url":null,"abstract":"Abstract: Promoters are DNA fragments located near the transcription initiation site, they can be divided into strong promoter type and weak promoter type according to transcriptional activation and expression level. Identifying promoters and their strengths in DNA sequences is essential for understanding gene expression regulation. Therefore, it is crucial to further improve predictive quality of predictors for real-world application requirements. Here, we constructed the latest training dataset based on the RegalonDB website, where all the promoters in this dataset have been experimentally validated, and their sequence similarity is less than 85%. We used one-hot and nucleotide chemical property and density (NCPD) to represent DNA sequence samples. Additionally, we proposed an ensemble deep learning framework containing a multi-head attention module, long short-term memory present, and a convolutional neural network module. The results showed that iPSI(2L)-EDL outperformed other existing methods for both promoter prediction and identification of strong promoter type and weak promoter type, the AUC and MCC for the iPSI(2L)-EDL in identifying promoter were improved by 2.23% and 2.96% compared to that of PseDNC-DL on independent testing data, respectively, while the AUC and MCC for the iPSI(2L)- EDL were increased by 3.74% and 5.86% in predicting promoter strength type, respectively. The results of ablation experiments indicate that CNN plays a crucial role in recognizing promoters, the importance of different input positions and long-range dependency relationships among features are helpful for recognizing promoters. Furthermore, to make it easier for most experimental scientists to get the results they need, a userfriendly web server has been established and can be accessed at http://47.94.248.117/IPSW(2L)-EDL.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135901216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting the risk of breast cancer recurrence and metastasis based on miRNA expression 基于miRNA表达预测乳腺癌复发和转移风险
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-14 DOI: 10.2174/1574893618666230914105741
Yaping Lv, Yanfeng Wang, Yumeng Zhang, Shuzhen Chen, Yuhua Yao
Background: Even after surgery, breast cancer patients still suffer from recurrence and metastasis. Thus, it is critical to predict accurately the risk of recurrence and metastasis for individual patients, which can help determine the appropriate adjuvant therapy. Methods: The purpose of this study is to investigate and compare the performance of several categories of molecular biomarkers, i.e., microRNA (miRNA), long non-coding RNA (lncRNA), messenger RNA (mRNA), and copy number variation (CNV), in predicting the risk of breast cancer recurrence and metastasis. First, the molecular data (miRNA, lncRNA, mRNA, and CNV) of 483 breast cancer patients were downloaded from the Cancer Genome Atlas, which were then randomly divided into the training and test sets with a ratio of 7:3. Second, the feature selection process was applied by univariate Cox and multivariate Cox variance analysis on the training set (e.g., 15 miRNAs). According to the selected features (e.g., 15 miRNAs), a random forest classifier and several other classification methods were established according to the label of recurrence and metastasis. Finally, the performances of the classification models were compared and evaluated on the test set. Results: The area under the ROC curve was 0.70 for miRNA, better than those using other biomarkers. Conclusion: These results indicated that miRNA has important guiding significance in predicting recurrence and metastasis of breast cancer.
背景:乳腺癌患者在手术后仍有复发和转移的危险。因此,准确预测个体患者复发和转移的风险至关重要,这有助于确定合适的辅助治疗。方法:本研究旨在探讨并比较microRNA (miRNA)、长链非编码RNA (lncRNA)、信使RNA (mRNA)、拷贝数变异(CNV)等几类分子生物标志物对乳腺癌复发转移风险的预测作用。首先从cancer Genome Atlas中下载483例乳腺癌患者的分子数据(miRNA、lncRNA、mRNA、CNV),按7:3的比例随机分为训练组和测试组。其次,对训练集(如15个mirna)进行单因素Cox和多因素Cox方差分析,进行特征选择过程。根据选择的特征(如15个mirna),根据复发和转移的标记建立随机森林分类器和其他几种分类方法。最后,在测试集上对分类模型的性能进行了比较和评价。结果:miRNA的ROC曲线下面积为0.70,优于其他生物标志物。结论:上述结果提示miRNA在预测乳腺癌复发转移方面具有重要的指导意义。
{"title":"Predicting the risk of breast cancer recurrence and metastasis based on miRNA expression","authors":"Yaping Lv, Yanfeng Wang, Yumeng Zhang, Shuzhen Chen, Yuhua Yao","doi":"10.2174/1574893618666230914105741","DOIUrl":"https://doi.org/10.2174/1574893618666230914105741","url":null,"abstract":"Background: Even after surgery, breast cancer patients still suffer from recurrence and metastasis. Thus, it is critical to predict accurately the risk of recurrence and metastasis for individual patients, which can help determine the appropriate adjuvant therapy. Methods: The purpose of this study is to investigate and compare the performance of several categories of molecular biomarkers, i.e., microRNA (miRNA), long non-coding RNA (lncRNA), messenger RNA (mRNA), and copy number variation (CNV), in predicting the risk of breast cancer recurrence and metastasis. First, the molecular data (miRNA, lncRNA, mRNA, and CNV) of 483 breast cancer patients were downloaded from the Cancer Genome Atlas, which were then randomly divided into the training and test sets with a ratio of 7:3. Second, the feature selection process was applied by univariate Cox and multivariate Cox variance analysis on the training set (e.g., 15 miRNAs). According to the selected features (e.g., 15 miRNAs), a random forest classifier and several other classification methods were established according to the label of recurrence and metastasis. Finally, the performances of the classification models were compared and evaluated on the test set. Results: The area under the ROC curve was 0.70 for miRNA, better than those using other biomarkers. Conclusion: These results indicated that miRNA has important guiding significance in predicting recurrence and metastasis of breast cancer.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134969906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation of LncRNAs Expression as a Potential Biomarker in the Diagnosis and Treatment of Human Brucellosis LncRNAs表达在人布鲁氏菌病诊断和治疗中的潜在生物标志物研究
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-14 DOI: 10.2174/1574893618666230914160213
Mansoor Kodori, Mohammad Abavisani, Hadis Fathizadeh, Mansoor Khaledi, Mohammad Hossein Haddadi, Shahrbanoo Keshavarz Aziziraftar, Foroogh Neamati, Amirhossein Sahebkar
Abstract: Long non-coding RNAs (LncRNAs) are significant contributors to bacterial infections and host defense responses, presenting a novel class of gene regulators beyond conventional protein-coding genes. This narrative review aimed to explore the involvement of LncRNAs as a potential biomarker in the diagnosis and treatment of bacterial infections, with a specific focus on Brucella infections. A comprehensive literature review was conducted to identify relevant studies examining the roles of LncRNAs in immune responses during bacterial infections, with a specific emphasis on Brucella infections. PubMed, Scopus and other major scientific databases were searched using relevant keywords. LncRNAs crucially regulate immune responses to bacterial infections, influencing transcription factors, pro-inflammatory cytokines, and immune cell behavior, with both positive and negative effects. The NF-κB pathway is a key regulator for many LncRNAs in bacterial infections. During Brucella infections, essential LncRNAs activate the innate immune response, increasing proinflammatory cytokine production and immune cell differentiation. LncRNAs are associated with human brucellosis, holding promise for screening, diagnostics, or therapeutics. Further research is needed to fully understand LncRNAs' precise functions in Brucella infection and pathogenesis. Specific LncRNAs, like IFNG-AS1 and NLRP3, are upregulated during brucellosis, while others, such as Gm28309, are downregulated, influencing immunosuppression and bacterial survival. Investigating the prognostic and therapeutic potential of Brucella-related LncRNAs warrants ongoing investigation, including their roles in other immune cells like macrophages, dendritic cells, and neutrophils responsible for bacterial clearance. Unraveling the intricate relationship between LncRNAs and brucellosis may reveal novel regulatory mechanisms and LncRNAs' roles in infection regulation, expediting diagnostics and enhancing therapeutic strategies against Brucella infections.
摘要:长链非编码rna (LncRNAs)是细菌感染和宿主防御反应的重要贡献者,是一类超越传统蛋白质编码基因的新型基因调控因子。本综述旨在探讨LncRNAs作为潜在生物标志物在细菌感染诊断和治疗中的作用,并特别关注布鲁氏菌感染。我们进行了一项全面的文献综述,以确定LncRNAs在细菌感染期间免疫反应中的作用的相关研究,特别强调布鲁氏菌感染。使用相关关键词对PubMed、Scopus等主要科学数据库进行检索。LncRNAs对细菌感染的免疫反应起着至关重要的调节作用,影响转录因子、促炎细胞因子和免疫细胞行为,有积极和消极的作用。NF-κB通路是细菌感染中许多lncrna的关键调节因子。在布鲁氏菌感染过程中,必需的lncrna激活先天免疫反应,增加促炎细胞因子的产生和免疫细胞分化。lncrna与人类布鲁氏菌病有关,有望用于筛查、诊断或治疗。为了充分了解LncRNAs在布鲁氏菌感染及其发病机制中的确切功能,还需要进一步的研究。特异性LncRNAs,如IFNG-AS1和NLRP3,在布鲁氏菌病期间上调,而其他LncRNAs,如Gm28309,下调,影响免疫抑制和细菌存活。研究布鲁氏菌相关lncrna的预后和治疗潜力需要继续研究,包括它们在其他免疫细胞中的作用,如巨噬细胞、树突状细胞和负责细菌清除的中性粒细胞。揭示LncRNAs与布鲁氏菌病之间的复杂关系可能揭示新的调控机制和LncRNAs在感染调控中的作用,加快布鲁氏菌感染的诊断和增强治疗策略。
{"title":"Investigation of LncRNAs Expression as a Potential Biomarker in the Diagnosis and Treatment of Human Brucellosis","authors":"Mansoor Kodori, Mohammad Abavisani, Hadis Fathizadeh, Mansoor Khaledi, Mohammad Hossein Haddadi, Shahrbanoo Keshavarz Aziziraftar, Foroogh Neamati, Amirhossein Sahebkar","doi":"10.2174/1574893618666230914160213","DOIUrl":"https://doi.org/10.2174/1574893618666230914160213","url":null,"abstract":"Abstract: Long non-coding RNAs (LncRNAs) are significant contributors to bacterial infections and host defense responses, presenting a novel class of gene regulators beyond conventional protein-coding genes. This narrative review aimed to explore the involvement of LncRNAs as a potential biomarker in the diagnosis and treatment of bacterial infections, with a specific focus on Brucella infections. A comprehensive literature review was conducted to identify relevant studies examining the roles of LncRNAs in immune responses during bacterial infections, with a specific emphasis on Brucella infections. PubMed, Scopus and other major scientific databases were searched using relevant keywords. LncRNAs crucially regulate immune responses to bacterial infections, influencing transcription factors, pro-inflammatory cytokines, and immune cell behavior, with both positive and negative effects. The NF-κB pathway is a key regulator for many LncRNAs in bacterial infections. During Brucella infections, essential LncRNAs activate the innate immune response, increasing proinflammatory cytokine production and immune cell differentiation. LncRNAs are associated with human brucellosis, holding promise for screening, diagnostics, or therapeutics. Further research is needed to fully understand LncRNAs' precise functions in Brucella infection and pathogenesis. Specific LncRNAs, like IFNG-AS1 and NLRP3, are upregulated during brucellosis, while others, such as Gm28309, are downregulated, influencing immunosuppression and bacterial survival. Investigating the prognostic and therapeutic potential of Brucella-related LncRNAs warrants ongoing investigation, including their roles in other immune cells like macrophages, dendritic cells, and neutrophils responsible for bacterial clearance. Unraveling the intricate relationship between LncRNAs and brucellosis may reveal novel regulatory mechanisms and LncRNAs' roles in infection regulation, expediting diagnostics and enhancing therapeutic strategies against Brucella infections.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134969910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thorough Assessment of Machine Learning Techniques for Predicting Protein-Nucleic Acid Binding Hot Spots 预测蛋白质-核酸结合热点的机器学习技术的全面评估
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-13 DOI: 10.2174/1574893618666230913090436
Xianzhe Zou, Chen Zhang, MIngyan Tang, Lei Deng
Background: Proteins and nucleic acids are vital biomolecules that contribute significantly to biological life. The precise and efficient identification of hot spots at protein-nucleic acid interfaces is crucial for guiding drug development, advancing protein engineering, and exploring the underlying molecular recognition mechanisms. As experimental methods like alanine scanning mutagenesis prove to be time-consuming and expensive, a growing number of machine learning techniques are being employed to predict hot spots. However, the existing approach is distinguished by a lack of uniform standards, a scarcity of data, and a wide range of attributes. Currently, there is no comprehensive overview or evaluation of this field. As a result, providing a full overview and review is extremely helpful. Methods: In this study, we present an overview of cutting-edge machine learning approaches utilized for hot spot prediction in protein-nucleic acid complexes. Additionally, we outline the feature categories currently in use, derived from relevant biological data sources, and assess conventional feature selection methods based on 600 extracted features. Simultaneously, we create two new benchmark datasets, PDHS87 and PRHS48, and develop distinct binary classification models based on these datasets to evaluate the advantages and disadvantages of various machine-learning techniques. Results: Prediction of protein-nucleic acid interaction hotspots is a challenging task. The study demonstrates that structural neighborhood features play a crucial role in identifying hot spots. The prediction performance can be improved by choosing effective feature selection methods and machine learning methods. Among the existing prediction methods, XGBPRH has the best performance. Conclusion: It is crucial to continue studying hot spot theories, discover new and effective features, add accurate experimental data, and utilize DNA/RNA information. Semi-supervised learning, transfer learning, and ensemble learning can optimize predictive ability. Combining computational docking with machine learning methods can potentially further improve predictive performance.
背景:蛋白质和核酸是对生物生命有重要贡献的重要生物分子。准确、高效地识别蛋白质-核酸界面热点对于指导药物开发、推进蛋白质工程、探索潜在的分子识别机制至关重要。由于像丙氨酸扫描诱变这样的实验方法被证明是耗时且昂贵的,越来越多的机器学习技术被用于预测热点。然而,现有方法的特点是缺乏统一的标准、数据稀缺和属性范围广。目前,对这一领域还没有全面的概述和评价。因此,提供一个完整的概述和回顾是非常有帮助的。方法:在本研究中,我们概述了用于蛋白质-核酸复合物热点预测的尖端机器学习方法。此外,我们概述了目前使用的特征类别,这些特征来自相关的生物数据源,并基于600个提取的特征评估了传统的特征选择方法。同时,我们创建了两个新的基准数据集PDHS87和PRHS48,并基于这些数据集开发了不同的二元分类模型,以评估各种机器学习技术的优缺点。结果:预测蛋白质与核酸相互作用热点是一项具有挑战性的任务。研究表明,结构邻域特征在热点识别中起着至关重要的作用。通过选择有效的特征选择方法和机器学习方法,可以提高预测性能。在现有的预测方法中,XGBPRH的预测效果最好。结论:继续研究热点理论,发现新的有效特征,增加准确的实验数据,利用DNA/RNA信息至关重要。半监督学习、迁移学习和集成学习可以优化预测能力。将计算对接与机器学习方法相结合,可以进一步提高预测性能。
{"title":"Thorough Assessment of Machine Learning Techniques for Predicting Protein-Nucleic Acid Binding Hot Spots","authors":"Xianzhe Zou, Chen Zhang, MIngyan Tang, Lei Deng","doi":"10.2174/1574893618666230913090436","DOIUrl":"https://doi.org/10.2174/1574893618666230913090436","url":null,"abstract":"Background: Proteins and nucleic acids are vital biomolecules that contribute significantly to biological life. The precise and efficient identification of hot spots at protein-nucleic acid interfaces is crucial for guiding drug development, advancing protein engineering, and exploring the underlying molecular recognition mechanisms. As experimental methods like alanine scanning mutagenesis prove to be time-consuming and expensive, a growing number of machine learning techniques are being employed to predict hot spots. However, the existing approach is distinguished by a lack of uniform standards, a scarcity of data, and a wide range of attributes. Currently, there is no comprehensive overview or evaluation of this field. As a result, providing a full overview and review is extremely helpful. Methods: In this study, we present an overview of cutting-edge machine learning approaches utilized for hot spot prediction in protein-nucleic acid complexes. Additionally, we outline the feature categories currently in use, derived from relevant biological data sources, and assess conventional feature selection methods based on 600 extracted features. Simultaneously, we create two new benchmark datasets, PDHS87 and PRHS48, and develop distinct binary classification models based on these datasets to evaluate the advantages and disadvantages of various machine-learning techniques. Results: Prediction of protein-nucleic acid interaction hotspots is a challenging task. The study demonstrates that structural neighborhood features play a crucial role in identifying hot spots. The prediction performance can be improved by choosing effective feature selection methods and machine learning methods. Among the existing prediction methods, XGBPRH has the best performance. Conclusion: It is crucial to continue studying hot spot theories, discover new and effective features, add accurate experimental data, and utilize DNA/RNA information. Semi-supervised learning, transfer learning, and ensemble learning can optimize predictive ability. Combining computational docking with machine learning methods can potentially further improve predictive performance.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug-Target Interaction Prediction By Combining Transformer and Graph Neural Networks 结合变形神经网络和图神经网络的药物-靶标相互作用预测
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-12 DOI: 10.2174/1574893618666230912141426
Junkai Liu, Yaoyao Lu, Shixuan Guan, Tengsheng Jiang, Yijie Ding, Qiming Fu, Zhiming Cui, Hongjie Wu
Background: The prediction of drug-target interactions (DTIs) plays an essential role in drug discovery. Recently, deep learning methods have been widely applied in DTI prediction. However, most of the existing research does not fully utilize the molecular structures of drug compounds and the sequence structures of proteins, which makes these models unable to obtain precise and effective feature representations. Methods: In this study, we propose a novel deep learning framework combining transformer and graph neural networks for predicting DTIs. Our model utilizes graph convolutional neural networks to capture the global and local structure information of drugs, and convolutional neural networks are employed to capture the sequence feature of targets. In addition, the obtained drug and protein representations are input to multi-layer transformer encoders, respectively, to integrate their features and generate final representations. Results: The experiments on benchmark datasets demonstrated that our model outperforms previous graph-based and transformer-based methods, with 1.5% and 1.8% improvement in precision and 0.2% and 1.0% improvement in recall, respectively. The results indicate that the transformer encoders effectively extract feature information of both drug compounds and proteins. Conclusion: Overall, our proposed method validates the applicability of combining graph neural networks and transformer architecture in drug discovery, and due to the attention mechanisms, it can extract deep structure feature data of drugs and proteins.
背景:药物-靶标相互作用(DTIs)预测在药物发现中起着至关重要的作用。近年来,深度学习方法在DTI预测中得到了广泛应用。然而,现有的研究大多没有充分利用药物化合物的分子结构和蛋白质的序列结构,使得这些模型无法获得精确有效的特征表示。方法:在本研究中,我们提出了一种结合变压器和图神经网络的新型深度学习框架来预测dti。我们的模型利用图卷积神经网络捕获药物的全局和局部结构信息,并利用卷积神经网络捕获靶点的序列特征。此外,将获得的药物表征和蛋白质表征分别输入到多层变压器编码器中,进行特征整合并生成最终表征。结果:在基准数据集上的实验表明,我们的模型优于先前基于图和变压器的方法,精度分别提高1.5%和1.8%,召回率分别提高0.2%和1.0%。结果表明,变压器编码器可以有效地提取药物化合物和蛋白质的特征信息。结论:总的来说,我们提出的方法验证了图神经网络和变压器架构相结合在药物发现中的适用性,并且由于注意机制,可以提取药物和蛋白质的深层结构特征数据。
{"title":"Drug-Target Interaction Prediction By Combining Transformer and Graph Neural Networks","authors":"Junkai Liu, Yaoyao Lu, Shixuan Guan, Tengsheng Jiang, Yijie Ding, Qiming Fu, Zhiming Cui, Hongjie Wu","doi":"10.2174/1574893618666230912141426","DOIUrl":"https://doi.org/10.2174/1574893618666230912141426","url":null,"abstract":"Background: The prediction of drug-target interactions (DTIs) plays an essential role in drug discovery. Recently, deep learning methods have been widely applied in DTI prediction. However, most of the existing research does not fully utilize the molecular structures of drug compounds and the sequence structures of proteins, which makes these models unable to obtain precise and effective feature representations. Methods: In this study, we propose a novel deep learning framework combining transformer and graph neural networks for predicting DTIs. Our model utilizes graph convolutional neural networks to capture the global and local structure information of drugs, and convolutional neural networks are employed to capture the sequence feature of targets. In addition, the obtained drug and protein representations are input to multi-layer transformer encoders, respectively, to integrate their features and generate final representations. Results: The experiments on benchmark datasets demonstrated that our model outperforms previous graph-based and transformer-based methods, with 1.5% and 1.8% improvement in precision and 0.2% and 1.0% improvement in recall, respectively. The results indicate that the transformer encoders effectively extract feature information of both drug compounds and proteins. Conclusion: Overall, our proposed method validates the applicability of combining graph neural networks and transformer architecture in drug discovery, and due to the attention mechanisms, it can extract deep structure feature data of drugs and proteins.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135885613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features 融合多特征预测植物泛素化蛋白和位点
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-08 DOI: 10.2174/1574893618666230908092847
Meng-Yue Guan, Wang-Ren Qiu, Qian-Kun Wang, Xuan Xiao
Introduction: Protein ubiquitylation is an important post-translational modification (PTM), which is considered to be one of the most important processes regulating cell function and various diseases. Therefore, accurate prediction of ubiquitylation proteins and their PTM sites is of great significance for the study of basic biological processes and the development of related drugs. Researchers have developed some large-scale computational methods to predict ubiquitylation sites, but there is still much room for improvement. Much of the research related to ubiquitylation is cross-species while the life pattern is diversified, and the prediction method always shows its specificity in practical application. This study just aims at the issue of plants and has constructed computational methods for identifying ubiquitylation protein and ubiquitylation sites. Method: In this work, we constructed two predictive models to identify plant ubiquitylation proteins and sites. First, in the ubiquitylation proteins prediction model, in order to better reflect protein sequence information and obtain better prediction results, the KNN scoring matrix model based on functional domain Gene Ontology (GO) annotation and word embedding model, i.e. Skip-Gram and Continuous Bag of Words (CBOW), are used to extract the features, and the light gradient boosting machine (LGBM) is selected as the ubiquitylation proteins prediction engine. Results: As a result, accuracy (ACC), Precision, recall rate (Recall), F1_score and AUC are respectively 85.12%, 80.96%, 72.80%, 76.37% and 0.9193 in the 10-fold cross-validations on independent dataset. In the ubiquitylation sites prediction model, Skip-Gram, CBOW and enhanced amino acid composition (EAAC) feature extraction codes were used to extract protein sequence fragment features, and the predicted results on training and independent test data have also achieved good performance. Conclusion: In a word, the comparison results demonstrate that our models have a decided advantage in predicting ubiquitylation proteins and sites, and it may provide useful insights for studying the mechanisms and modulation of ubiquitination pathways
蛋白质泛素化是一种重要的翻译后修饰(post-translational modification, PTM),被认为是调节细胞功能和各种疾病的重要过程之一。因此,准确预测泛素化蛋白及其PTM位点,对于基础生物学过程的研究和相关药物的开发具有重要意义。研究人员已经开发了一些大规模的计算方法来预测泛素化位点,但仍有很大的改进空间。与泛素化相关的研究多为跨物种研究,且生命模式多样,预测方法在实际应用中往往显示出其特殊性。本研究针对植物的问题,构建了识别泛素化蛋白和泛素化位点的计算方法。方法:建立了植物泛素化蛋白和位点的预测模型。首先,在泛素化蛋白预测模型中,为了更好地反映蛋白质序列信息,获得更好的预测结果,采用基于功能域基因本体(GO)标注的KNN评分矩阵模型和基于Skip-Gram和连续词包(CBOW)的词嵌入模型进行特征提取,并选择光梯度增强机(LGBM)作为泛素化蛋白预测引擎。结果:独立数据集上10倍交叉验证的准确率(ACC)、精密度(Precision)、召回率(recall)、F1_score和AUC分别为85.12%、80.96%、72.80%、76.37%和0.9193。在泛素化位点预测模型中,采用Skip-Gram、CBOW和enhanced amino acid composition (EAAC)特征提取代码提取蛋白质序列片段特征,在训练数据和独立测试数据上的预测结果也取得了较好的效果。结论:比较结果表明,我们的模型在预测泛素化蛋白和位点方面具有明显的优势,为研究泛素化途径的机制和调控提供了有益的见解
{"title":"Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features","authors":"Meng-Yue Guan, Wang-Ren Qiu, Qian-Kun Wang, Xuan Xiao","doi":"10.2174/1574893618666230908092847","DOIUrl":"https://doi.org/10.2174/1574893618666230908092847","url":null,"abstract":"Introduction: Protein ubiquitylation is an important post-translational modification (PTM), which is considered to be one of the most important processes regulating cell function and various diseases. Therefore, accurate prediction of ubiquitylation proteins and their PTM sites is of great significance for the study of basic biological processes and the development of related drugs. Researchers have developed some large-scale computational methods to predict ubiquitylation sites, but there is still much room for improvement. Much of the research related to ubiquitylation is cross-species while the life pattern is diversified, and the prediction method always shows its specificity in practical application. This study just aims at the issue of plants and has constructed computational methods for identifying ubiquitylation protein and ubiquitylation sites. Method: In this work, we constructed two predictive models to identify plant ubiquitylation proteins and sites. First, in the ubiquitylation proteins prediction model, in order to better reflect protein sequence information and obtain better prediction results, the KNN scoring matrix model based on functional domain Gene Ontology (GO) annotation and word embedding model, i.e. Skip-Gram and Continuous Bag of Words (CBOW), are used to extract the features, and the light gradient boosting machine (LGBM) is selected as the ubiquitylation proteins prediction engine. Results: As a result, accuracy (ACC), Precision, recall rate (Recall), F1_score and AUC are respectively 85.12%, 80.96%, 72.80%, 76.37% and 0.9193 in the 10-fold cross-validations on independent dataset. In the ubiquitylation sites prediction model, Skip-Gram, CBOW and enhanced amino acid composition (EAAC) feature extraction codes were used to extract protein sequence fragment features, and the predicted results on training and independent test data have also achieved good performance. Conclusion: In a word, the comparison results demonstrate that our models have a decided advantage in predicting ubiquitylation proteins and sites, and it may provide useful insights for studying the mechanisms and modulation of ubiquitination pathways","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136361817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identification and Functional Prediction of lncRNAs using Bioinformatic Techniques 利用生物信息学技术鉴定和预测lncrna的功能
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-07 DOI: 10.2174/1574893618666230907165829
Shizuka Uchida
{"title":"Identification and Functional Prediction of lncRNAs using Bioinformatic Techniques","authors":"Shizuka Uchida","doi":"10.2174/1574893618666230907165829","DOIUrl":"https://doi.org/10.2174/1574893618666230907165829","url":null,"abstract":"<jats:sec>\u0000<jats:title />\u0000<jats:p />\u0000</jats:sec>","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135096561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Current Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1