首页 > 最新文献

Current Bioinformatics最新文献

英文 中文
RDR100: an effective computational method for identifying Kruppel-like factors RDR100:一种识别类克虏伯因子的有效计算方法
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-05 DOI: 10.2174/1574893618666230905102407
Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan
Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.
kr ppel样因子(KLFs)是一类含有锌指的转录因子,可调节多种细胞过程。KLF蛋白与人类疾病有关,如癌症、心血管疾病和代谢紊乱。KLF家族由18个成员组成,在许多组织中具有不同的表达谱。考虑到KLF蛋白参与重要的生物学功能,准确的鉴定和注释是至关重要的。虽然实验方法可以精确地鉴定KLF蛋白,但大规模鉴定是复杂、缓慢和昂贵的。在这项研究中,我们开发了RDR100,这是一个基于随机森林(RF)的新型框架,用于根据KLF蛋白的初级序列预测KLF蛋白。首先,我们使用递归特征消除方法确定了十个不同特征的最佳编码,然后使用五种不同的机器学习(ML)分类器训练各自的模型。结果:采用独立数据集对所有模型的性能进行评估,基于交叉验证和独立评估的一致性,最终选择RDR100作为最终模型。我们的研究结果表明,RDR100是KLF蛋白的一个强有力的预测因子。RDR100 web服务器可在https://procarb.org/RDR100/上获得。
{"title":"RDR100: an effective computational method for identifying Kruppel-like factors","authors":"Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan","doi":"10.2174/1574893618666230905102407","DOIUrl":"https://doi.org/10.2174/1574893618666230905102407","url":null,"abstract":"\u0000\u0000Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.\u0000\u0000\u0000\u0000In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.\u0000\u0000\u0000\u0000Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48427754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In Silico Study of Clinical Prognosis Associated MicroRNAs for patients with Metastasis in Clear Cell Renal Carcinoma 透明细胞肾癌转移患者临床预后相关microrna的计算机研究
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-05 DOI: 10.2174/1574893618666230905154441
Ezra B. Wijaya, Venugopala Reddy Mekala, Efendi Zaenudin, Ka-Lok Ng
Background: Metastasis involves multiple stages and various genetic and epigenetic alterations. MicroRNA has been investigated as a biomarker and prognostic tool in various cancer types and stages. Nevertheless, exploring the role of miRNA in kidney cancer remains a significant challenge, given the ability of a single miRNA to target multiple genes within biological networks and pathways. background: Metastasis involves multiple stages and various genetic and epigenetic alterations. MicroRNA has been investigated as a biomarker and prognostic tool in various cancer types and stages. Nevertheless, exploring the role of miRNA in kidney cancer remains a significant challenge, given the ability of a single miRNA to target multiple genes within biological networks and pathways. Objective: This study aims to propose a computational research framework that hypothesizes that a set of miRNAs functions as key regulators in modulating gene expression networks of kidney cancer survival. Method: We retrieved the NGS data from the TCGA-KIRC extracted from UCSC Xena. A set of prognostic miRNAs was acquired through multiple Cox regression analyses. We adopted machine learning approaches to evaluate miRNA prognosis's classification performance between normal, primary (M0), and metastasis (M1) samples. The molecular mechanism between primary cancer and metastasis was investigated by identifying the regulatory networks of miRNA's target genes. Result: A total of 14 miRNAs were identified as potential prognostic indicators. A combination of high-expression miRNAs was associated with survival probability. Machine learning achieved an average accuracy of 95% in distinguishing primary cancer from normal tissue and 79% in predicting the metastasis from primary tissue. Correlation analysis of miRNA prognostics with target genes unveiled regulatory network disparities between metastatic and primary tissues. Conclusion: This study has identified 14 miRNAs that could potentially serve as vital biomarkers for diagnosing and prognosing ccRCC. Differential regulatory networks between metastatic and primary tissues in this study provide the molecular basis for assessment and therapeutic treatment for ccRCC patients
背景:转移涉及多个阶段和各种遗传和表观遗传改变。MicroRNA已被研究作为生物标志物和预后工具在各种癌症类型和分期。然而,考虑到单个miRNA能够靶向生物网络和途径中的多个基因,探索miRNA在肾癌中的作用仍然是一个重大挑战。背景:转移涉及多个阶段和各种遗传和表观遗传改变。MicroRNA已被研究作为生物标志物和预后工具在各种癌症类型和分期。然而,考虑到单个miRNA能够靶向生物网络和途径中的多个基因,探索miRNA在肾癌中的作用仍然是一个重大挑战。目的:本研究旨在提出一个计算研究框架,该框架假设一组mirna在调节肾癌生存的基因表达网络中起关键调节作用。方法:我们从UCSC Xena提取的TCGA-KIRC中检索NGS数据。通过多重Cox回归分析获得一组预后mirna。我们采用机器学习方法来评估正常、原发(M0)和转移(M1)样本之间的miRNA预后分类性能。通过鉴定miRNA靶基因的调控网络,探讨原发肿瘤与转移之间的分子机制。结果:共有14个mirna被确定为潜在的预后指标。高表达mirna的组合与生存率相关。机器学习在区分原发癌和正常组织方面的平均准确率为95%,在预测原发癌转移方面的平均准确率为79%。miRNA预后与靶基因的相关性分析揭示了转移性组织和原发组织之间的调控网络差异。结论:本研究已鉴定出14种mirna,可能作为ccRCC诊断和预后的重要生物标志物。本研究中转移组织和原发组织之间的差异调控网络为ccRCC患者的评估和治疗提供了分子基础
{"title":"In Silico Study of Clinical Prognosis Associated MicroRNAs for patients with Metastasis in Clear Cell Renal Carcinoma","authors":"Ezra B. Wijaya, Venugopala Reddy Mekala, Efendi Zaenudin, Ka-Lok Ng","doi":"10.2174/1574893618666230905154441","DOIUrl":"https://doi.org/10.2174/1574893618666230905154441","url":null,"abstract":"Background: Metastasis involves multiple stages and various genetic and epigenetic alterations. MicroRNA has been investigated as a biomarker and prognostic tool in various cancer types and stages. Nevertheless, exploring the role of miRNA in kidney cancer remains a significant challenge, given the ability of a single miRNA to target multiple genes within biological networks and pathways. background: Metastasis involves multiple stages and various genetic and epigenetic alterations. MicroRNA has been investigated as a biomarker and prognostic tool in various cancer types and stages. Nevertheless, exploring the role of miRNA in kidney cancer remains a significant challenge, given the ability of a single miRNA to target multiple genes within biological networks and pathways. Objective: This study aims to propose a computational research framework that hypothesizes that a set of miRNAs functions as key regulators in modulating gene expression networks of kidney cancer survival. Method: We retrieved the NGS data from the TCGA-KIRC extracted from UCSC Xena. A set of prognostic miRNAs was acquired through multiple Cox regression analyses. We adopted machine learning approaches to evaluate miRNA prognosis's classification performance between normal, primary (M0), and metastasis (M1) samples. The molecular mechanism between primary cancer and metastasis was investigated by identifying the regulatory networks of miRNA's target genes. Result: A total of 14 miRNAs were identified as potential prognostic indicators. A combination of high-expression miRNAs was associated with survival probability. Machine learning achieved an average accuracy of 95% in distinguishing primary cancer from normal tissue and 79% in predicting the metastasis from primary tissue. Correlation analysis of miRNA prognostics with target genes unveiled regulatory network disparities between metastatic and primary tissues. Conclusion: This study has identified 14 miRNAs that could potentially serve as vital biomarkers for diagnosing and prognosing ccRCC. Differential regulatory networks between metastatic and primary tissues in this study provide the molecular basis for assessment and therapeutic treatment for ccRCC patients","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135362419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer and graph transformer-based prediction of drug-target interactions 基于变换器和图变换器的药物-靶标相互作用预测
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-25 DOI: 10.2174/1574893618666230825121841
Weizhong Lu, Meiling Qian, Yu Zhang, Junkai Liu, Hongjie Wu, Yaoyao Lu, Haiou Li, Qiming Fu, Jiyun Shen, Yongbiao Xiao
As we all know, finding new pharmaceuticals requires a lot of time and money, which has compelled people to think about adopting more effective approaches to locate drugs. Researchers have made significant progress recently when it comes to using Deep Learning (DL) to create DTI..Therefore, we propose a deep learning model that applies Transformer to DTI prediction. The model uses a Transformer and Graph Transformer to extract the feature information of protein and compound molecules, respectively, and combines their respective representations to predict interactions.We used Human and C.elegans, the two benchmark datasets, evaluated the proposed method in different experimental settings and compared it with the latest DL model.The results show that the proposed model based on DL is an effective method for the classification and recognition of DTI prediction, and its performance on the two data sets is significantly better than other DL based methods.
我们都知道,寻找新药需要大量的时间和金钱,这迫使人们考虑采用更有效的方法来定位药物。研究人员最近在使用深度学习(DL)创建DTI方面取得了重大进展。因此,我们提出了一种将Transformer应用于DTI预测的深度学习模型。该模型使用Transformer和Graph Transformer分别提取蛋白质和化合物分子的特征信息,并结合它们各自的表示来预测相互作用。我们使用人类和秀丽隐杆线虫这两个基准数据集,在不同的实验设置中评估了所提出的方法,并将其与最新的深度学习模型进行了比较。结果表明,基于深度学习的模型是一种有效的DTI预测分类识别方法,其在两个数据集上的性能明显优于其他基于深度学习的方法。
{"title":"Transformer and graph transformer-based prediction of drug-target interactions","authors":"Weizhong Lu, Meiling Qian, Yu Zhang, Junkai Liu, Hongjie Wu, Yaoyao Lu, Haiou Li, Qiming Fu, Jiyun Shen, Yongbiao Xiao","doi":"10.2174/1574893618666230825121841","DOIUrl":"https://doi.org/10.2174/1574893618666230825121841","url":null,"abstract":"\u0000\u0000As we all know, finding new pharmaceuticals requires a lot of time and money, which has compelled people to think about adopting more effective approaches to locate drugs. Researchers have made significant progress recently when it comes to using Deep Learning (DL) to create DTI..\u0000\u0000\u0000\u0000Therefore, we propose a deep learning model that applies Transformer to DTI prediction. The model uses a Transformer and Graph Transformer to extract the feature information of protein and compound molecules, respectively, and combines their respective representations to predict interactions.\u0000\u0000\u0000\u0000We used Human and C.elegans, the two benchmark datasets, evaluated the proposed method in different experimental settings and compared it with the latest DL model.\u0000\u0000\u0000\u0000The results show that the proposed model based on DL is an effective method for the classification and recognition of DTI prediction, and its performance on the two data sets is significantly better than other DL based methods.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43591358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepEpi: Deep Learning Model for Predicting Gene Expression Regulation based on Epigenetic Histone Modifications DeepEpi:基于表观遗传组蛋白修饰预测基因表达调控的深度学习模型
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-18 DOI: 10.2174/1574893618666230818121046
Rania Hamdy, Yasser M. K. Omar, F. Maghraby
Histone modification is a vital element in gene expression regulation. The way in which these proteins bind to the DNA impacts whether or not a gene may be expressed. Although those factors cannot in-fluence DNA construction, they can influence how it is transcribed.Each spatial location in DNA has its function, so the spatial ar-rangement of chromatin modifications affects how the gene can express. Al-so, gene regulation is affected by the type of histone modification combina-tions that are present on the gene and depends on the spatial distributional pattern of these modifications and how long these modifications read on a gene region. So, this study aims to know how to model Long-range spatial genome data and model complex dependencies among Histone reads.The Convolution Neural Network (CNN) is used to model all da-ta features in this paper. It can detect patterns in histones signals and pre-serve the spatial information of these patterns. It also uses the concept of memory in long short-term memory (LSTM), using vanilla LSTM, Bi-Directional LSTM, or Stacked LSTM to preserve long-range histones sig-nals. Additionally, it tries to combine these methods using ConvLSTM or uses them together with the aid of a self-attention.Based on the results, the combination of CNN, LSTM with the self-attention mechanism obtained an Area under the Curve (AUC) score of 88.87% over 56 cell types.The result outperforms the present state-of-the-art model and provides insight into how combinatorial interactions between histone modi-fication marks can control gene expression. The source code is available at https://github.com/RaniaHamdy/DeepEpi.
组蛋白修饰是基因表达调控的重要组成部分。这些蛋白质与DNA结合的方式影响着基因是否可以表达。尽管这些因素不能影响DNA的构建,但它们可以影响DNA的转录方式。DNA中的每个空间位置都有其功能,因此染色质修饰的空间排列影响基因的表达方式。同样,基因调控受到存在于基因上的组蛋白修饰组合类型的影响,并取决于这些修饰的空间分布模式以及这些修饰在基因区域上读取的时间长短。因此,本研究旨在了解如何对远程空间基因组数据进行建模,并对组蛋白reads之间的复杂依赖关系进行建模。本文使用卷积神经网络(CNN)对所有数据特征进行建模。它可以检测组蛋白信号中的模式,并预先保存这些模式的空间信息。它还使用长短期记忆(LSTM)的概念,使用vanilla LSTM、双向LSTM或堆叠LSTM来保存远程组蛋白信号。此外,它尝试使用ConvLSTM将这些方法结合起来,或者在自我关注的帮助下将它们一起使用。结果表明,CNN、LSTM结合自注意机制在56种细胞类型中获得了88.87%的曲线下面积(Area under the Curve, AUC)得分。结果优于目前最先进的模型,并提供了洞察组蛋白修饰标记之间的组合相互作用如何控制基因表达。源代码可从https://github.com/RaniaHamdy/DeepEpi获得。
{"title":"DeepEpi: Deep Learning Model for Predicting Gene Expression Regulation based on Epigenetic Histone Modifications","authors":"Rania Hamdy, Yasser M. K. Omar, F. Maghraby","doi":"10.2174/1574893618666230818121046","DOIUrl":"https://doi.org/10.2174/1574893618666230818121046","url":null,"abstract":"\u0000\u0000Histone modification is a vital element in gene expression regulation. The way in which these proteins bind to the DNA impacts whether or not a gene may be expressed. Although those factors cannot in-fluence DNA construction, they can influence how it is transcribed.\u0000\u0000\u0000\u0000Each spatial location in DNA has its function, so the spatial ar-rangement of chromatin modifications affects how the gene can express. Al-so, gene regulation is affected by the type of histone modification combina-tions that are present on the gene and depends on the spatial distributional pattern of these modifications and how long these modifications read on a gene region. So, this study aims to know how to model Long-range spatial genome data and model complex dependencies among Histone reads.\u0000\u0000\u0000\u0000The Convolution Neural Network (CNN) is used to model all da-ta features in this paper. It can detect patterns in histones signals and pre-serve the spatial information of these patterns. It also uses the concept of memory in long short-term memory (LSTM), using vanilla LSTM, Bi-Directional LSTM, or Stacked LSTM to preserve long-range histones sig-nals. Additionally, it tries to combine these methods using ConvLSTM or uses them together with the aid of a self-attention.\u0000\u0000\u0000\u0000Based on the results, the combination of CNN, LSTM with the self-attention mechanism obtained an Area under the Curve (AUC) score of 88.87% over 56 cell types.\u0000\u0000\u0000\u0000The result outperforms the present state-of-the-art model and provides insight into how combinatorial interactions between histone modi-fication marks can control gene expression. The source code is available at https://github.com/RaniaHamdy/DeepEpi.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43840967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scSDSC: Self-supervised Deep Subspace Clustering for scRNA-seq Data scSDSC:scRNA-seq数据的自监督深子空间聚类
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-16 DOI: 10.2174/1574893618666230816090443
Jian-ping Zhao, Bo Yang, Hai-yun Wang, Chunhan Zheng
Single-cell RNA sequencing(scRNA-seq) data can identify heterogeneity between cells, thereby identifying cell types and discovering rare cell types. Clustering is often used to identify cell types, but the high noise and high dimension of scRNA-seq lead to the degradation of clustering performance and impact downstream analysis. Deep learning is widely used in this field, which provides promising performance in feature learning.Most deep learning models only consider the relationship between genes, ignore the relationship between cells. We try to use the relationships between cells and the relationships between genes to construct clustering models.We proposed scSDSC: a deep subspace cluster architecture that considers the relationships between genes and cells at the same time. Similar to deep subspace clustering (DSC), we added a fully connected layer after the embedding layer to obtain the self-expression matrix. In addition, we also added a fully connected SoftMax layer to generate the pseudo-label and used the information carried by the pseudo-label for model training. Finally, the affinity matrix is obtained for spectral clustering.Experimental results on eight real datasets show that scSDSC outperforms existing methods in downstream analysis.Our method plays an important role in improving clustering accuracy and downstream analysis.
单细胞RNA测序(scRNA-seq)数据可以识别细胞间的异质性,从而鉴定细胞类型,发现罕见的细胞类型。聚类通常用于识别细胞类型,但scRNA-seq的高噪声和高维数导致聚类性能下降,影响下游分析。深度学习在这一领域得到了广泛的应用,在特征学习方面有很好的表现。大多数深度学习模型只考虑基因之间的关系,忽略了细胞之间的关系。我们尝试使用细胞之间的关系和基因之间的关系来构建聚类模型。我们提出了一种同时考虑基因和细胞之间关系的深层子空间簇结构——scSDSC。与深子空间聚类(deep subspace clustering, DSC)类似,我们在嵌入层之后增加一个完全连接层来获得自表达矩阵。此外,我们还增加了一个全连接的SoftMax层来生成伪标签,并利用伪标签携带的信息进行模型训练。最后,得到用于谱聚类的亲和矩阵。在8个真实数据集上的实验结果表明,scSDSC在下游分析方面优于现有方法。该方法在提高聚类精度和下游分析方面发挥了重要作用。
{"title":"scSDSC: Self-supervised Deep Subspace Clustering for scRNA-seq Data","authors":"Jian-ping Zhao, Bo Yang, Hai-yun Wang, Chunhan Zheng","doi":"10.2174/1574893618666230816090443","DOIUrl":"https://doi.org/10.2174/1574893618666230816090443","url":null,"abstract":"\u0000\u0000Single-cell RNA sequencing(scRNA-seq) data can identify heterogeneity between cells, thereby identifying cell types and discovering rare cell types. Clustering is often used to identify cell types, but the high noise and high dimension of scRNA-seq lead to the degradation of clustering performance and impact downstream analysis. Deep learning is widely used in this field, which provides promising performance in feature learning.\u0000\u0000\u0000\u0000Most deep learning models only consider the relationship between genes, ignore the relationship between cells. We try to use the relationships between cells and the relationships between genes to construct clustering models.\u0000\u0000\u0000\u0000We proposed scSDSC: a deep subspace cluster architecture that considers the relationships between genes and cells at the same time. Similar to deep subspace clustering (DSC), we added a fully connected layer after the embedding layer to obtain the self-expression matrix. In addition, we also added a fully connected SoftMax layer to generate the pseudo-label and used the information carried by the pseudo-label for model training. Finally, the affinity matrix is obtained for spectral clustering.\u0000\u0000\u0000\u0000Experimental results on eight real datasets show that scSDSC outperforms existing methods in downstream analysis.\u0000\u0000\u0000\u0000Our method plays an important role in improving clustering accuracy and downstream analysis.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43254950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Metric to Characterize Differentially Methylated Region Sets Detected from Methylation Array Data 表征从甲基化阵列数据中检测到的差异甲基化区域集的度量
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-16 DOI: 10.2174/1574893618666230816141723
Xiaoqing Peng, Wanxin Cui, Wenjin Zhang, Zihao Li, Xiaoshu Zhu, L. Yuan, Ji Li
Identifying differentially methylated region (DMR) is a basic but important task in epigenomics, which can help investigate the mechanisms of diseases and provide methylation biomarkers for screening diseases. A set of methods have been proposed to identify DMRs from methylation array data. However, it lacks effective metrics to characterize different DMR sets and enable a straight way for comparison.In this study, we introduce a metric, DMRn, to characterize DMR sets detected by different methods from methylation array data. To calculate DMRn, firstly, the methylation differences of DMRs are recalculated by incorporating the correlations between probes and their represented CpGs. Then, DMRn is calculated based on the number of probes and the dense of CpGs in DMRs with methylation differences falling in each interval.By comparing the DMRn of DMR sets predicted by seven methods on four scenario, the results demonstrate that DMRn can make an efficient guidance for selecting DMR sets, and provide new insights in cancer genomics studies by comparing the DMR sets from the related pathological states. For example, there are many regions with subtle methylation alteration in subtypes of prostate cancer are altered oppositely in the benign state, which may indicate a possible revision mechanism in benign prostate cancer.Futhermore, when applied to datasets that underwent different runs of batch effect removal, the DMRn can help to visualize the bias introduced by multi-runs of batch effect removal. The tool for calculating DMRn is available in the GitHub repository(https://github.com/xqpeng/DMRArrayMetric).
鉴定差异甲基化区(DMR)是表观基因组学中一项基本而重要的工作,有助于研究疾病的发生机制,并为疾病筛查提供甲基化生物标志物。提出了一套从甲基化阵列数据中识别DMRs的方法。然而,它缺乏有效的指标来表征不同的DMR集,并能够直接进行比较。在这项研究中,我们引入了一个度量,DMRn,来表征甲基化阵列数据中不同方法检测到的DMR集。为了计算DMRn,首先,通过结合探针与其所代表的CpGs之间的相关性,重新计算dmr的甲基化差异。然后,根据探针数量和dmr中CpGs的密度计算DMRn,甲基化差异在每个区间内下降。通过比较4种情况下7种方法预测的DMR集的DMRn,结果表明DMRn可以有效地指导DMR集的选择,并通过比较相关病理状态的DMR集,为癌症基因组学研究提供新的见解。例如,在前列腺癌亚型中,有许多具有细微甲基化改变的区域在良性状态下发生相反的改变,这可能提示良性前列腺癌中可能存在一种修正机制。此外,当应用于经历不同批次效果去除运行的数据集时,DMRn可以帮助可视化由多次批次效果去除运行引入的偏差。计算DMRn的工具可在GitHub存储库(https://github.com/xqpeng/DMRArrayMetric)中获得。
{"title":"A Metric to Characterize Differentially Methylated Region Sets Detected from Methylation Array Data","authors":"Xiaoqing Peng, Wanxin Cui, Wenjin Zhang, Zihao Li, Xiaoshu Zhu, L. Yuan, Ji Li","doi":"10.2174/1574893618666230816141723","DOIUrl":"https://doi.org/10.2174/1574893618666230816141723","url":null,"abstract":"\u0000\u0000Identifying differentially methylated region (DMR) is a basic but important task in epigenomics, which can help investigate the mechanisms of diseases and provide methylation biomarkers for screening diseases. A set of methods have been proposed to identify DMRs from methylation array data. However, it lacks effective metrics to characterize different DMR sets and enable a straight way for comparison.\u0000\u0000\u0000\u0000In this study, we introduce a metric, DMRn, to characterize DMR sets detected by different methods from methylation array data. To calculate DMRn, firstly, the methylation differences of DMRs are recalculated by incorporating the correlations between probes and their represented CpGs. Then, DMRn is calculated based on the number of probes and the dense of CpGs in DMRs with methylation differences falling in each interval.\u0000\u0000\u0000\u0000By comparing the DMRn of DMR sets predicted by seven methods on four scenario, the results demonstrate that DMRn can make an efficient guidance for selecting DMR sets, and provide new insights in cancer genomics studies by comparing the DMR sets from the related pathological states. For example, there are many regions with subtle methylation alteration in subtypes of prostate cancer are altered oppositely in the benign state, which may indicate a possible revision mechanism in benign prostate cancer.\u0000\u0000\u0000\u0000Futhermore, when applied to datasets that underwent different runs of batch effect removal, the DMRn can help to visualize the bias introduced by multi-runs of batch effect removal. The tool for calculating DMRn is available in the GitHub repository(https://github.com/xqpeng/DMRArrayMetric).\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42313996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug–target binding affinity prediction based on three-branched multiscale convolutional neural networks 基于三分支多尺度卷积神经网络的药物靶点结合亲和力预测
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-16 DOI: 10.2174/1574893618666230816090548
Yaoyao Lu, Junkai Liu, T. Jiang, Zhiming Cui, Hongjie Wu
New drugs are costly, time-consuming, and often accompanied by safety concerns. With the development of deep learning, computer-aided drug design has become more mainstream, and convolutional neural networks and graph neural networks have been widely used for drug–target affinity (DTA) prediction.The paper proposes a method of predicting DTA using graph convolutional networks and multiscale convolutional neural networks.We construct drug molecules into graph representation vectors and learn feature expressions through graph attention networks and graph convolutional networks. A three-branch convolutional neural network learns the local and global features of protein sequences, and the two feature representations are merged into a regression module to predict the DTA.We present a novel model to predict DTA, with a 2.5% improvement in the consistency index and a 21% accuracy improvement in terms of the mean squared error on the Davis dataset compared to DeepDTA. Morever, our method outperformed other mainstream DTA prediction models namely, GANsDTA, WideDTA, GraphDTA and DeepAffinity.The results showed that the use of multiscale convolutional neural networks was better than a single-branched convolutional neural network at capturing protein signatures and the use of graphs to express drug molecules yielded better results.
新药成本高、耗时长,而且往往伴随着安全问题。随着深度学习的发展,计算机辅助药物设计变得更加主流,卷积神经网络和图神经网络已被广泛用于药物-靶标亲和力(DTA)预测。本文提出了一种利用图卷积网络和多尺度卷积神经网络预测DTA的方法。我们将药物分子构建成图表示向量,并通过图注意力网络和图卷积网络学习特征表达。三分支卷积神经网络学习蛋白质序列的局部和全局特征,并将这两个特征表示合并到一个回归模块中以预测DTA。我们提出了一种预测DTA的新模型,与DeepDTA相比,Davis数据集的一致性指数提高了2.5%,均方误差的准确率提高了21%。此外,我们的方法优于其他主流DTA预测模型,即GANsDTA、WideDTA、GraphDTA和DeepAffinity。结果表明,在捕获蛋白质特征方面,使用多尺度卷积神经网络比使用单分支卷积神经网络更好,并且使用图来表达药物分子产生了更好的结果。
{"title":"Drug–target binding affinity prediction based on three-branched multiscale convolutional neural networks","authors":"Yaoyao Lu, Junkai Liu, T. Jiang, Zhiming Cui, Hongjie Wu","doi":"10.2174/1574893618666230816090548","DOIUrl":"https://doi.org/10.2174/1574893618666230816090548","url":null,"abstract":"\u0000\u0000New drugs are costly, time-consuming, and often accompanied by safety concerns. With the development of deep learning, computer-aided drug design has become more mainstream, and convolutional neural networks and graph neural networks have been widely used for drug–target affinity (DTA) prediction.\u0000\u0000\u0000\u0000The paper proposes a method of predicting DTA using graph convolutional networks and multiscale convolutional neural networks.\u0000\u0000\u0000\u0000We construct drug molecules into graph representation vectors and learn feature expressions through graph attention networks and graph convolutional networks. A three-branch convolutional neural network learns the local and global features of protein sequences, and the two feature representations are merged into a regression module to predict the DTA.\u0000\u0000\u0000\u0000We present a novel model to predict DTA, with a 2.5% improvement in the consistency index and a 21% accuracy improvement in terms of the mean squared error on the Davis dataset compared to DeepDTA. Morever, our method outperformed other mainstream DTA prediction models namely, GANsDTA, WideDTA, GraphDTA and DeepAffinity.\u0000\u0000\u0000\u0000The results showed that the use of multiscale convolutional neural networks was better than a single-branched convolutional neural network at capturing protein signatures and the use of graphs to express drug molecules yielded better results.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44958222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TumorDet: A Breast Tumor Detection Model Based on Transfer Learning and ShuffleNet 肿瘤检测:一种基于迁移学习和ShuffleNet的乳腺肿瘤检测模型
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-15 DOI: 10.2174/1574893618666230815121150
Leying Pan, T. Zhang, Qiang Yang, Guoping Yang, Nan Han, Shaojie Qiao
Breast tumor is among the most malignant tumors and early detection can improve patient’s survival rate. Currently, mammography is the most reliable method for diagnosing breast tumor because of high image resolution. Because of the rapid development of medical and artificial intelligence techniques, computer-aided diagnosis technology can greatly improve the detection accuracy of breast tumors and medical imaging has begun to use deep-learning-based approaches. In this study, the TumorDet model is proposed to detect the benign and malignant lesions of breast tumor, which has positive significance for assisting doctors in diagnosis.We use the proposed TumorDet to analyze and predict breast tumors on the real MRI dataset.(1) We introduce an adaptive gamma correction (AGC) method to balance brightness equalization and increase the contrast of mammography images; (2) we use the ShuffleNet model to exchange information between different feature layers and extract the hidden high-level features of medical images; and (3) we use the transfer learning method to fine-tune the ShuffleNet model and obtain the optimal parameters.The proposed TumorDet model has shown that accuracy, sensitivity, and specificity reach 90.43%, 89.37%, and 87.81%, respectively. TumorDet performs well in the breast tumor detection task. In addition, we use the proposed TumorDet to conduct experiments on other tasks, such as forest fires, and the robustness of TumorDet is proved by experimental results.TumorDet employs the ShuffleNet model to exchange information between different feature layers without increasing the number of network parameters and applies transfer learning methods to further extract the basic features of medical images by fine-tuning. The model is beneficial for the localization and classification of breast tumors and also performs well in forest fire detection.
乳腺肿瘤是最恶性的肿瘤之一,早期发现可以提高患者的生存率。目前,乳房X光检查由于图像分辨率高,是诊断乳腺肿瘤最可靠的方法。由于医学和人工智能技术的快速发展,计算机辅助诊断技术可以大大提高乳腺肿瘤的检测精度,医学成像已经开始使用基于深度学习的方法。本研究提出了肿瘤Det模型来检测乳腺肿瘤的良恶性病变,对协助医生诊断具有积极意义。我们使用所提出的肿瘤Det在真实的MRI数据集上分析和预测乳腺肿瘤。(1) 我们介绍了一种自适应伽马校正(AGC)方法,以平衡亮度均衡并提高乳房X光摄影图像的对比度;(2) 我们使用ShuffleNet模型在不同的特征层之间交换信息,提取医学图像的隐藏高级特征;(3)利用迁移学习方法对ShuffleNet模型进行微调,得到最优参数。所提出的肿瘤检测模型的准确性、敏感性和特异性分别达到90.43%、89.37%和87.81%。肿瘤Det在乳腺肿瘤检测任务中表现良好。此外,我们使用所提出的TumorDet对森林火灾等其他任务进行了实验,实验结果证明了TumorDet的稳健性。TumorDet采用ShuffleNet模型在不增加网络参数数量的情况下在不同特征层之间交换信息,并应用迁移学习方法通过微调进一步提取医学图像的基本特征。该模型有利于乳腺肿瘤的定位和分类,在森林火灾探测中也表现良好。
{"title":"TumorDet: A Breast Tumor Detection Model Based on Transfer Learning and ShuffleNet","authors":"Leying Pan, T. Zhang, Qiang Yang, Guoping Yang, Nan Han, Shaojie Qiao","doi":"10.2174/1574893618666230815121150","DOIUrl":"https://doi.org/10.2174/1574893618666230815121150","url":null,"abstract":"\u0000\u0000Breast tumor is among the most malignant tumors and early detection can improve patient’s survival rate. Currently, mammography is the most reliable method for diagnosing breast tumor because of high image resolution. Because of the rapid development of medical and artificial intelligence techniques, computer-aided diagnosis technology can greatly improve the detection accuracy of breast tumors and medical imaging has begun to use deep-learning-based approaches. In this study, the TumorDet model is proposed to detect the benign and malignant lesions of breast tumor, which has positive significance for assisting doctors in diagnosis.\u0000\u0000\u0000\u0000We use the proposed TumorDet to analyze and predict breast tumors on the real MRI dataset.\u0000\u0000\u0000\u0000(1) We introduce an adaptive gamma correction (AGC) method to balance brightness equalization and increase the contrast of mammography images; (2) we use the ShuffleNet model to exchange information between different feature layers and extract the hidden high-level features of medical images; and (3) we use the transfer learning method to fine-tune the ShuffleNet model and obtain the optimal parameters.\u0000\u0000\u0000\u0000The proposed TumorDet model has shown that accuracy, sensitivity, and specificity reach 90.43%, 89.37%, and 87.81%, respectively. TumorDet performs well in the breast tumor detection task. In addition, we use the proposed TumorDet to conduct experiments on other tasks, such as forest fires, and the robustness of TumorDet is proved by experimental results.\u0000\u0000\u0000\u0000TumorDet employs the ShuffleNet model to exchange information between different feature layers without increasing the number of network parameters and applies transfer learning methods to further extract the basic features of medical images by fine-tuning. The model is beneficial for the localization and classification of breast tumors and also performs well in forest fire detection.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49632050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revealing ANXA6 as a Novel Autophagy-related Target for Pre-eclampsia Based on the Machine Learning 基于机器学习揭示ANXA6作为子痫前期自噬相关的新靶点
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-07 DOI: 10.2174/1574893618666230807123016
Baoping Zhu, Huizhen Geng, Fan Yang, Yanxin Wu, Tiefeng Cao, Dongyu Wang, Zilian Wang
Preeclampsia (PE) is a severe pregnancy complication associated with autophagy.This research sought to uncover autophagy-related genes in pre-eclampsia through bioinformatics and machine learning.GSE75010 from the GEO series was subjected to WGCNA to identify key modular genes in PE. Autophagy genes retrieved from the THANATOS overlapped with the modular genes to yield PE-related autophagy genes. Furthermore, the crucial step involved the utilization of two machine learning algorithms (LASSO and SVM-RFE) for dimensionality reduction. The candidate gene was further verified by quantitative reverse transcription polymerase chain reaction, western blot, and immunohistochemistry. Preliminary experiments were conducted on HTR-8/SVneo cell lines to explore the role of candidate genes in autophagy regulation.WGCNA identified 291 genes from 5 hubs, and after overlapping with 1087 autophagy-related genes obtained from THANATOS, 42 PE-related ARGs were identified. ANXA6 was recognized as a potential target through SVM-RFE and LASSO analyses. The mRNA and protein expression of ANXA6 were verified in placenta samples. In HTR8/SVneo cells, modulating ANXA6 expression altered autophagy levels. Knocking down ANXA6 resulted in an anti-autophagy effect, which was reversed by treatment with CAL101, an inhibitor of PI3K, Akt, and mTOR.We observed that ANXA6 may serve as a possible PE action target and that autophagy may be crucial to the pathogenesis of PE.
子痫前期(PE)是一种与自噬相关的严重妊娠并发症。本研究试图通过生物信息学和机器学习来揭示子痫前期自噬相关基因。对GEO系列中的GSE75010进行WGCNA鉴定PE中的关键模块基因。从THANATOS中提取的自噬基因与模块基因重叠产生pe相关的自噬基因。此外,关键步骤涉及使用两种机器学习算法(LASSO和SVM-RFE)进行降维。通过定量逆转录聚合酶链反应、免疫印迹和免疫组织化学进一步验证候选基因。我们在HTR-8/SVneo细胞系上进行了初步实验,探讨候选基因在自噬调控中的作用。WGCNA从5个枢纽中鉴定出291个基因,与THANATOS获得的1087个自噬相关基因重叠后,鉴定出42个pe相关ARGs。通过SVM-RFE和LASSO分析,ANXA6被认为是潜在的靶点。在胎盘样品中验证了ANXA6 mRNA和蛋白的表达。在HTR8/SVneo细胞中,调节ANXA6表达可改变自噬水平。抑制ANXA6可产生抗自噬作用,这一作用可通过CAL101 (PI3K、Akt和mTOR的抑制剂)治疗逆转。我们观察到ANXA6可能是PE的一个可能的作用靶点,并且自噬可能对PE的发病机制至关重要。
{"title":"Revealing ANXA6 as a Novel Autophagy-related Target for Pre-eclampsia Based on the Machine Learning","authors":"Baoping Zhu, Huizhen Geng, Fan Yang, Yanxin Wu, Tiefeng Cao, Dongyu Wang, Zilian Wang","doi":"10.2174/1574893618666230807123016","DOIUrl":"https://doi.org/10.2174/1574893618666230807123016","url":null,"abstract":"\u0000\u0000Preeclampsia (PE) is a severe pregnancy complication associated with autophagy.\u0000\u0000\u0000\u0000This research sought to uncover autophagy-related genes in pre-eclampsia through bioinformatics and machine learning.\u0000\u0000\u0000\u0000GSE75010 from the GEO series was subjected to WGCNA to identify key modular genes in PE. Autophagy genes retrieved from the THANATOS overlapped with the modular genes to yield PE-related autophagy genes. Furthermore, the crucial step involved the utilization of two machine learning algorithms (LASSO and SVM-RFE) for dimensionality reduction. The candidate gene was further verified by quantitative reverse transcription polymerase chain reaction, western blot, and immunohistochemistry. Preliminary experiments were conducted on HTR-8/SVneo cell lines to explore the role of candidate genes in autophagy regulation.\u0000\u0000\u0000\u0000WGCNA identified 291 genes from 5 hubs, and after overlapping with 1087 autophagy-related genes obtained from THANATOS, 42 PE-related ARGs were identified. ANXA6 was recognized as a potential target through SVM-RFE and LASSO analyses. The mRNA and protein expression of ANXA6 were verified in placenta samples. In HTR8/SVneo cells, modulating ANXA6 expression altered autophagy levels. Knocking down ANXA6 resulted in an anti-autophagy effect, which was reversed by treatment with CAL101, an inhibitor of PI3K, Akt, and mTOR.\u0000\u0000\u0000\u0000We observed that ANXA6 may serve as a possible PE action target and that autophagy may be crucial to the pathogenesis of PE.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45513523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full-length PacBio Amplicon Sequencing to Unveil RNA Editing Sites 全长PacBio扩增子测序揭示RNA编辑位点
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-08-03 DOI: 10.2174/1574893618666230803112142
Xiao-lu Zhu, Ming-ling Liao, Ya-Jie Zhu, Yun‐wei Dong
RNA editing enriches post-transcriptional sequence changes. Currently detecting RNA editing sites is mostly based on the Sanger sequencing platform and second-generation sequencing. However, detection with Sanger sequencing is limited by the disturbing background peaks using the direct sequencing method and the clone number using the clone sequencing method, while second-generation sequencing detection is constrained by its short read.We aimed to design a pipeline that can accurately detect RNA editing sites for full-length long-read amplicons to meet the requirement when focusing on a few specific genes of interest.We developed a novel high-throughput RNA editing sites detection pipeline based on the PacBio circular consensus sequences sequencing which is accurate with high-throughput and long-read coverage. We tested the pipeline on cytosolic malate dehydrogenase in the hard-shelled mussel Mytilus coruscus and further validated it using direct Sanger sequencing.Data generated from the PacBio circular consensus sequences (CCS) amplicons in three mussels were first filtered by quality and then selected by open reading frame. After filtering, 225-2047 sequences of the three mussels, respectively, were used to identify RNA editing sites. With corresponding genomic DNA sequences, we extracted 227-799 candidate RNA editing sites excluding heterozygous sites. We further figured out 7-11 final RESs using a new error model specially designed for RNA editing site detection. The resulting RNA editing sites all agree with the validation using the Sanger sequencing.We report a near-zero error rate method in identifying RNA editing sites of long-read amplicons with the use of PacBio CCS sequencing.
RNA编辑丰富了转录后序列的变化。目前检测RNA编辑位点主要基于Sanger测序平台和第二代测序。然而,Sanger测序的检测受到使用直接测序方法的干扰背景峰和使用克隆测序方法的克隆数量的限制,而第二代测序检测受到其短读数的限制。我们旨在设计一种管道,可以准确检测全长长读扩增子的RNA编辑位点,以满足关注少数感兴趣的特定基因的要求。我们开发了一种基于PacBio循环共有序列测序的新型高通量RNA编辑位点检测流水线,该流水线具有高通量和长读覆盖率。我们在硬壳贻贝Mytilus coruscus中测试了细胞溶质苹果酸脱氢酶,并使用直接Sanger测序进一步验证了这一点。从三种贻贝中的PacBio循环共有序列(CCS)扩增子产生的数据首先通过质量过滤,然后通过开放阅读框进行选择。过滤后,分别使用三种贻贝的225-2047个序列来鉴定RNA编辑位点。利用相应的基因组DNA序列,我们提取了227-799个候选RNA编辑位点,不包括杂合位点。我们使用专门为RNA编辑位点检测设计的新误差模型进一步计算出7-11个最终RES。得到的RNA编辑位点都与使用Sanger测序的验证一致。我们报道了一种使用PacBio-CCS测序识别长读扩增子RNA编辑位点的接近零错误率方法。
{"title":"Full-length PacBio Amplicon Sequencing to Unveil RNA Editing Sites","authors":"Xiao-lu Zhu, Ming-ling Liao, Ya-Jie Zhu, Yun‐wei Dong","doi":"10.2174/1574893618666230803112142","DOIUrl":"https://doi.org/10.2174/1574893618666230803112142","url":null,"abstract":"\u0000\u0000RNA editing enriches post-transcriptional sequence changes. Currently detecting RNA editing sites is mostly based on the Sanger sequencing platform and second-generation sequencing. However, detection with Sanger sequencing is limited by the disturbing background peaks using the direct sequencing method and the clone number using the clone sequencing method, while second-generation sequencing detection is constrained by its short read.\u0000\u0000\u0000\u0000We aimed to design a pipeline that can accurately detect RNA editing sites for full-length long-read amplicons to meet the requirement when focusing on a few specific genes of interest.\u0000\u0000\u0000\u0000We developed a novel high-throughput RNA editing sites detection pipeline based on the PacBio circular consensus sequences sequencing which is accurate with high-throughput and long-read coverage. We tested the pipeline on cytosolic malate dehydrogenase in the hard-shelled mussel Mytilus coruscus and further validated it using direct Sanger sequencing.\u0000\u0000\u0000\u0000Data generated from the PacBio circular consensus sequences (CCS) amplicons in three mussels were first filtered by quality and then selected by open reading frame. After filtering, 225-2047 sequences of the three mussels, respectively, were used to identify RNA editing sites. With corresponding genomic DNA sequences, we extracted 227-799 candidate RNA editing sites excluding heterozygous sites. We further figured out 7-11 final RESs using a new error model specially designed for RNA editing site detection. The resulting RNA editing sites all agree with the validation using the Sanger sequencing.\u0000\u0000\u0000\u0000We report a near-zero error rate method in identifying RNA editing sites of long-read amplicons with the use of PacBio CCS sequencing.\u0000","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45187580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Current Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1