首页 > 最新文献

Briefings in Functional Genomics最新文献

英文 中文
Emerging trends in functional genomics in Spiralia. 螺旋藻功能基因组学的新趋势。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-17 DOI: 10.1093/bfgp/elad048
José M Martín-Durán
{"title":"Emerging trends in functional genomics in Spiralia.","authors":"José M Martín-Durán","doi":"10.1093/bfgp/elad048","DOIUrl":"10.1093/bfgp/elad048","url":null,"abstract":"","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"22 6","pages":"485-486"},"PeriodicalIF":4.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138048858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-cell transcriptomics refuels the exploration of spiralian biology. 单细胞转录组学促进了对螺旋动物生物学的探索。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-17 DOI: 10.1093/bfgp/elad038
Laura Piovani, Ferdinand Marlétaz

Spiralians represent the least studied superclade of bilaterian animals, despite exhibiting the widest diversity of organisms. Although spiralians include iconic organisms, such as octopus, earthworms and clams, a lot remains to be discovered regarding their phylogeny and biology. Here, we review recent attempts to apply single-cell transcriptomics, a new pioneering technology enabling the classification of cell types and the characterisation of their gene expression profiles, to several spiralian taxa. We discuss the methodological challenges and requirements for applying this approach to marine organisms and explore the insights that can be brought by such studies, both from a biomedical and evolutionary perspective. For instance, we show that single-cell sequencing might help solve the riddle of the homology of larval forms across spiralians, but also to better characterise and compare the processes of regeneration across taxa. We highlight the capacity of single-cell to investigate the origin of evolutionary novelties, as the mollusc shell or the cephalopod visual system, but also to interrogate the conservation of the molecular fingerprint of cell types at long evolutionary distances. We hope that single-cell sequencing will open a new window in understanding the biology of spiralians, and help renew the interest for these overlooked but captivating organisms.

螺旋体动物是双边动物中研究最少的超级分支,尽管它们展示了最广泛的生物多样性。虽然螺旋体包括标志性的生物,如章鱼、蚯蚓和蛤蜊,但关于它们的系统发育和生物学还有很多有待发现。在这里,我们回顾了最近将单细胞转录组学应用于几个螺旋动物分类群的尝试。单细胞转录组学是一种新的开创性技术,可以对细胞类型进行分类并表征其基因表达谱。我们讨论了将这种方法应用于海洋生物的方法挑战和要求,并从生物医学和进化的角度探讨了此类研究可以带来的见解。例如,我们表明单细胞测序可能有助于解决螺旋体幼虫形式的同源性之谜,但也可以更好地表征和比较不同分类群的再生过程。我们强调了单细胞研究进化新事物起源的能力,如软体动物外壳或头足类动物的视觉系统,但也询问细胞类型的分子指纹在长进化距离上的保存。我们希望单细胞测序将打开一个新的窗口,了解螺旋体的生物学,并帮助重新关注这些被忽视但迷人的生物。
{"title":"Single-cell transcriptomics refuels the exploration of spiralian biology.","authors":"Laura Piovani, Ferdinand Marlétaz","doi":"10.1093/bfgp/elad038","DOIUrl":"10.1093/bfgp/elad038","url":null,"abstract":"<p><p>Spiralians represent the least studied superclade of bilaterian animals, despite exhibiting the widest diversity of organisms. Although spiralians include iconic organisms, such as octopus, earthworms and clams, a lot remains to be discovered regarding their phylogeny and biology. Here, we review recent attempts to apply single-cell transcriptomics, a new pioneering technology enabling the classification of cell types and the characterisation of their gene expression profiles, to several spiralian taxa. We discuss the methodological challenges and requirements for applying this approach to marine organisms and explore the insights that can be brought by such studies, both from a biomedical and evolutionary perspective. For instance, we show that single-cell sequencing might help solve the riddle of the homology of larval forms across spiralians, but also to better characterise and compare the processes of regeneration across taxa. We highlight the capacity of single-cell to investigate the origin of evolutionary novelties, as the mollusc shell or the cephalopod visual system, but also to interrogate the conservation of the molecular fingerprint of cell types at long evolutionary distances. We hope that single-cell sequencing will open a new window in understanding the biology of spiralians, and help renew the interest for these overlooked but captivating organisms.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"517-524"},"PeriodicalIF":2.5,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10658179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10054238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning. Be-1DCNN:基于bagging集成学习的染色质环预测神经网络模型。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad015
Hao Wu, Bing Zhou, Haoru Zhou, Pengyu Zhang, Meili Wang

The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.

染色体三维(3D)结构中的染色质环对基因表达的调控至关重要。尽管高通量染色质捕获技术可以识别染色体的三维结构,但利用生物学实验进行染色质环检测是艰巨而耗时的。因此,需要一种计算方法来检测染色质环。深度神经网络可以形成Hi-C数据的复杂表示,并提供处理生物数据集的可能性。因此,我们提出了一个bagging ensemble一维卷积神经网络(Be-1DCNN)来检测全基因组Hi-C图谱中的染色质环。首先,为了获得准确可靠的全基因组接触图谱中的染色质环,利用bagging集成学习方法对多个1DCNN模型的预测结果进行综合。其次,每个1DCNN模型由三个用于从输入样本中提取高维特征的1D卷积层和一个用于生成预测结果的致密层组成。最后,将Be-1DCNN的预测结果与现有模型的预测结果进行了比较。实验结果表明,Be-1DCNN预测高质量的染色质环,并且使用相同的评估指标优于最先进的方法。Be-1DCNN的源代码可在https://github.com/HaoWuLab-Bioinformatics/Be1DCNN上免费获得。
{"title":"Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning.","authors":"Hao Wu, Bing Zhou, Haoru Zhou, Pengyu Zhang, Meili Wang","doi":"10.1093/bfgp/elad015","DOIUrl":"10.1093/bfgp/elad015","url":null,"abstract":"<p><p>The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"475-484"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9753347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-based GCN integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification. 基于关注的GCN整合了乳腺癌亚型分类和患者特异性基因标记识别的多组学数据。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad013
Hui Guo, Xiang Lv, Yizhou Li, Menglong Li

Breast cancer is a heterogeneous disease and can be divided into several subtypes with unique prognostic and molecular characteristics. The classification of breast cancer subtypes plays an important role in the precision treatment and prognosis of breast cancer. Benefitting from the relation-aware ability of a graph convolution network (GCN), we present a multi-omics integrative method, the attention-based GCN (AGCN), for breast cancer molecular subtype classification using messenger RNA expression, copy number variation and deoxyribonucleic acid methylation multi-omics data. In the extensive comparative studies, our AGCN models outperform state-of-the-art methods under different experimental conditions and both attention mechanisms and the graph convolution subnetwork play an important role in accurate cancer subtype classification. The layer-wise relevance propagation (LRP) algorithm is used for the interpretation of model decision, which can identify patient-specific important biomarkers that are reported to be related to the occurrence and development of breast cancer. Our results highlighted the effectiveness of the GCN and attention mechanisms in multi-omics integrative analysis and the implement of the LRP algorithm can provide biologically reasonable insights into model decision.

乳腺癌是一种异质性疾病,可分为几种亚型,具有独特的预后和分子特征。乳腺癌亚型的分型对乳腺癌的精准治疗和预后具有重要作用。利用图卷积网络(GCN)的关系感知能力,我们提出了一种多组学整合方法,即基于注意力的GCN (AGCN),利用信使RNA表达、拷贝数变化和脱氧核糖核酸甲基化多组学数据进行乳腺癌分子亚型分类。在广泛的对比研究中,我们的AGCN模型在不同的实验条件下都优于最先进的方法,并且注意机制和图卷积子网络在准确的癌症亚型分类中都发挥了重要作用。分层相关传播(LRP)算法用于模型决策的解释,该算法可以识别与乳腺癌发生和发展相关的患者特异性重要生物标志物。我们的研究结果强调了GCN和注意力机制在多组学整合分析中的有效性,并且LRP算法的实施可以为模型决策提供生物学上合理的见解。
{"title":"Attention-based GCN integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification.","authors":"Hui Guo, Xiang Lv, Yizhou Li, Menglong Li","doi":"10.1093/bfgp/elad013","DOIUrl":"10.1093/bfgp/elad013","url":null,"abstract":"<p><p>Breast cancer is a heterogeneous disease and can be divided into several subtypes with unique prognostic and molecular characteristics. The classification of breast cancer subtypes plays an important role in the precision treatment and prognosis of breast cancer. Benefitting from the relation-aware ability of a graph convolution network (GCN), we present a multi-omics integrative method, the attention-based GCN (AGCN), for breast cancer molecular subtype classification using messenger RNA expression, copy number variation and deoxyribonucleic acid methylation multi-omics data. In the extensive comparative studies, our AGCN models outperform state-of-the-art methods under different experimental conditions and both attention mechanisms and the graph convolution subnetwork play an important role in accurate cancer subtype classification. The layer-wise relevance propagation (LRP) algorithm is used for the interpretation of model decision, which can identify patient-specific important biomarkers that are reported to be related to the occurrence and development of breast cancer. Our results highlighted the effectiveness of the GCN and attention mechanisms in multi-omics integrative analysis and the implement of the LRP algorithm can provide biologically reasonable insights into model decision.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"463-474"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9726805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive deep propagation graph neural network for predicting miRNA-disease associations. 预测mirna与疾病关联的自适应深度传播图神经网络。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad010
Hua Hu, Huan Zhao, Tangbo Zhong, Xishang Dong, Lei Wang, Pengyong Han, Zhengwei Li

Background: A large number of experiments show that the abnormal expression of miRNA is closely related to the occurrence, diagnosis and treatment of diseases. Identifying associations between miRNAs and diseases is important for clinical applications of complex human diseases. However, traditional biological experimental methods and calculation-based methods have many limitations, which lead to the development of more efficient and accurate deep learning methods for predicting miRNA-disease associations.

Results: In this paper, we propose a novel model on the basis of adaptive deep propagation graph neural network to predict miRNA-disease associations (ADPMDA). We first construct the miRNA-disease heterogeneous graph based on known miRNA-disease pairs, miRNA integrated similarity information, miRNA sequence information and disease similarity information. Then, we project the features of miRNAs and diseases into a low-dimensional space. After that, attention mechanism is utilized to aggregate the local features of central nodes. In particular, an adaptive deep propagation graph neural network is employed to learn the embedding of nodes, which can adaptively adjust the local and global information of nodes. Finally, the multi-layer perceptron is leveraged to score miRNA-disease pairs.

Conclusion: Experiments on human microRNA disease database v3.0 dataset show that ADPMDA achieves the mean AUC value of 94.75% under 5-fold cross-validation. We further conduct case studies on the esophageal neoplasm, lung neoplasms and lymphoma to confirm the effectiveness of our proposed model, and 49, 49, 47 of the top 50 predicted miRNAs associated with these diseases are confirmed, respectively. These results demonstrate the effectiveness and superiority of our model in predicting miRNA-disease associations.

背景:大量实验表明,miRNA的异常表达与疾病的发生、诊断和治疗密切相关。确定mirna与疾病之间的关联对于复杂人类疾病的临床应用具有重要意义。然而,传统的生物学实验方法和基于计算的方法存在许多局限性,这促使人们开发更高效、更准确的深度学习方法来预测mirna与疾病的关联。结果:本文提出了一种基于自适应深度传播图神经网络的mirna -疾病关联预测模型(ADPMDA)。我们首先基于已知的miRNA-疾病对、miRNA综合相似度信息、miRNA序列信息和疾病相似度信息构建了miRNA-疾病异质性图。然后,我们将mirna和疾病的特征投射到一个低维空间中。然后利用注意机制对中心节点的局部特征进行聚合。其中,采用自适应深度传播图神经网络学习节点嵌入,可以自适应调整节点的局部和全局信息。最后,利用多层感知器对mirna -疾病对进行评分。结论:在人类microRNA疾病数据库v3.0数据集上的实验表明,在5倍交叉验证下,ADPMDA的平均AUC值达到94.75%。我们进一步对食管肿瘤、肺肿瘤和淋巴瘤进行了病例研究,以证实我们提出的模型的有效性,并分别确认了与这些疾病相关的前50个预测mirna中的49个、49个、47个。这些结果证明了我们的模型在预测mirna -疾病关联方面的有效性和优越性。
{"title":"Adaptive deep propagation graph neural network for predicting miRNA-disease associations.","authors":"Hua Hu, Huan Zhao, Tangbo Zhong, Xishang Dong, Lei Wang, Pengyong Han, Zhengwei Li","doi":"10.1093/bfgp/elad010","DOIUrl":"10.1093/bfgp/elad010","url":null,"abstract":"<p><strong>Background: </strong>A large number of experiments show that the abnormal expression of miRNA is closely related to the occurrence, diagnosis and treatment of diseases. Identifying associations between miRNAs and diseases is important for clinical applications of complex human diseases. However, traditional biological experimental methods and calculation-based methods have many limitations, which lead to the development of more efficient and accurate deep learning methods for predicting miRNA-disease associations.</p><p><strong>Results: </strong>In this paper, we propose a novel model on the basis of adaptive deep propagation graph neural network to predict miRNA-disease associations (ADPMDA). We first construct the miRNA-disease heterogeneous graph based on known miRNA-disease pairs, miRNA integrated similarity information, miRNA sequence information and disease similarity information. Then, we project the features of miRNAs and diseases into a low-dimensional space. After that, attention mechanism is utilized to aggregate the local features of central nodes. In particular, an adaptive deep propagation graph neural network is employed to learn the embedding of nodes, which can adaptively adjust the local and global information of nodes. Finally, the multi-layer perceptron is leveraged to score miRNA-disease pairs.</p><p><strong>Conclusion: </strong>Experiments on human microRNA disease database v3.0 dataset show that ADPMDA achieves the mean AUC value of 94.75% under 5-fold cross-validation. We further conduct case studies on the esophageal neoplasm, lung neoplasms and lymphoma to confirm the effectiveness of our proposed model, and 49, 49, 47 of the top 50 predicted miRNAs associated with these diseases are confirmed, respectively. These results demonstrate the effectiveness and superiority of our model in predicting miRNA-disease associations.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"453-462"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9385707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features. rblight:一个利用光梯度增强机和进化特征集合发现植物特异性rna结合蛋白的计算工具。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad016
Upendra K Pradhan, Prabina K Meher, Sanchita Naha, Soumen Pal, Sagar Gupta, Ajit Gupta, Rajender Parsad

RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.

rna结合蛋白(rbp)是真核生物转录后基因调控的关键,包括剪接控制、mRNA转运和衰变。因此,rbp的准确鉴定对于了解基因表达和细胞状态调控具有重要意义。为了检测rbp,已经开发了许多计算模型。这些方法使用了来自几种真核生物物种的数据集,特别是来自小鼠和人类的数据集。尽管一些模型已经在拟南芥上进行了测试,但这些技术无法正确识别其他植物物种的rbp。因此,需要开发一种强大的计算模型来识别植物特异性rbp。在这项研究中,我们提出了一个新的计算模型来定位植物中的rbp。利用5种深度学习模型和10种浅学习算法对20个序列衍生特征集和20个进化特征集进行预测。光梯度增强机的重复5倍交叉验证精度最高,AU-ROC为91.24%,AU-PRC为91.91%。在使用独立数据集进行评估时,开发的方法实现了94.00% AU-ROC和94.50% AU-PRC。与目前可用的最先进的RBP预测模型相比,该模型在预测植物特异性RBP方面取得了显着更高的准确性。尽管某些模型已经在模式生物拟南芥上进行了训练和评估,但这是发现植物特异性rbp的第一个综合计算机模型。为了方便研究人员识别植物中的rbp,还开发了web服务器rblight,该服务器可在https://iasri-sg.icar.gov.in/rbplight/上公开访问。
{"title":"RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features.","authors":"Upendra K Pradhan, Prabina K Meher, Sanchita Naha, Soumen Pal, Sagar Gupta, Ajit Gupta, Rajender Parsad","doi":"10.1093/bfgp/elad016","DOIUrl":"10.1093/bfgp/elad016","url":null,"abstract":"<p><p>RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"401-410"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9432913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate prediction and key protein sequence feature identification of cyclins. 细胞周期蛋白的准确预测和关键蛋白序列特征鉴定。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad014
Shaoyou Yu, Bo Liao, Wen Zhu, Dejun Peng, Fangxiang Wu

Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.

细胞周期蛋白是一组通过与细胞周期蛋白依赖性激酶形成复合物来激活细胞周期的蛋白质。正确识别细胞周期蛋白可以为理解细胞周期蛋白的功能提供关键线索。然而,由于周期蛋白序列之间的相似性较低,迫切需要基于机器学习的方法来识别周期。本研究采用基于谱的自交叉协方差法提取细胞周期蛋白序列特征。然后根据最大关联-最大距离(MRMD) 1.0和MRMD2.0对特征进行排序和选择。最后,通过10倍交叉验证对预测模型进行评估。计算实验表明,MRMD1.0生成的最佳蛋白质序列特征可以使用随机森林(RF)分类器正确预测98.2%的细胞周期蛋白,而MRMD2.0识别的7维关键蛋白质序列特征可以正确预测96.1%的细胞周期蛋白,在维度和性能比较方面都优于相同数据集上的先前研究。因此,我们的工作为识别细胞周期蛋白提供了有价值的工具。模型数据可从https://github.com/YUshunL/cyclin下载。
{"title":"Accurate prediction and key protein sequence feature identification of cyclins.","authors":"Shaoyou Yu, Bo Liao, Wen Zhu, Dejun Peng, Fangxiang Wu","doi":"10.1093/bfgp/elad014","DOIUrl":"10.1093/bfgp/elad014","url":null,"abstract":"<p><p>Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"411-419"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9730015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ncRNALocate-EL: a multi-label ncRNA subcellular locality prediction model based on ensemble learning. ncrnlocate - el:基于集成学习的多标签ncRNA亚细胞位置预测模型。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad007
Tao Bai, Bin Liu

Subcellular localizations of ncRNAs are associated with specific functions. Currently, an increasing number of biological researchers are focusing on computational approaches to identify subcellular localizations of ncRNAs. However, the performance of the existing computational methods is low and needs to be further studied. First, most prediction models are trained with outdated databases. Second, only a few predictors can identify multiple subcellular localizations simultaneously. In this work, we establish three human ncRNA subcellular datasets based on the latest RNALocate, including lncRNA, miRNA and snoRNA, and then we propose a novel multi-label classification model based on ensemble learning called ncRNALocate-EL to identify multi-label subcellular localizations of three ncRNAs. The results show that the ncRNALocate-EL outperforms previous methods. Our method achieved an average precision of 0.709,0.977 and 0.730 on three human ncRNA datasets. The web server of ncRNALocate-EL has been established, which can be accessed at https://bliulab.net/ncRNALocate-EL.

ncrna的亚细胞定位与特定功能相关。目前,越来越多的生物学研究人员正在关注计算方法来识别ncrna的亚细胞定位。然而,现有的计算方法的性能较低,需要进一步研究。首先,大多数预测模型都是用过时的数据库训练的。其次,只有少数预测因子可以同时识别多个亚细胞定位。在这项工作中,我们基于最新的rnallocate建立了三个人类ncRNA亚细胞数据集,包括lncRNA, miRNA和snoRNA,然后我们提出了一个新的基于集成学习的多标签分类模型ncrnallocate - el来识别三种ncRNA的多标签亚细胞定位。结果表明,ncrnlocate - el方法优于以往的方法。该方法在3个人类ncRNA数据集上的平均精度分别为0.709、0.977和0.730。已建立ncRNALocate-EL的web服务器,可登录https://bliulab.net/ncRNALocate-EL访问。
{"title":"ncRNALocate-EL: a multi-label ncRNA subcellular locality prediction model based on ensemble learning.","authors":"Tao Bai, Bin Liu","doi":"10.1093/bfgp/elad007","DOIUrl":"10.1093/bfgp/elad007","url":null,"abstract":"<p><p>Subcellular localizations of ncRNAs are associated with specific functions. Currently, an increasing number of biological researchers are focusing on computational approaches to identify subcellular localizations of ncRNAs. However, the performance of the existing computational methods is low and needs to be further studied. First, most prediction models are trained with outdated databases. Second, only a few predictors can identify multiple subcellular localizations simultaneously. In this work, we establish three human ncRNA subcellular datasets based on the latest RNALocate, including lncRNA, miRNA and snoRNA, and then we propose a novel multi-label classification model based on ensemble learning called ncRNALocate-EL to identify multi-label subcellular localizations of three ncRNAs. The results show that the ncRNALocate-EL outperforms previous methods. Our method achieved an average precision of 0.709,0.977 and 0.730 on three human ncRNA datasets. The web server of ncRNALocate-EL has been established, which can be accessed at https://bliulab.net/ncRNALocate-EL.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"442-452"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9375214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AAFL: automatic association feature learning for gene signature identification of cancer subtypes in single-cell RNA-seq data. AAFL:用于单细胞RNA-seq数据中癌症亚型基因标记识别的自动关联特征学习。
IF 4 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elac047
Meng Huang, Changzhou Long, Jiangtao Ma

Single-cell RNA-sequencing (scRNA-seq) technologies have enabled the study of human cancers in individual cells, which explores the cellular heterogeneity and the genotypic status of tumors. Gene signature identification plays an important role in the precise classification of cancer subtypes. However, most existing gene selection methods only select the same informative genes for each subtype. In this study, we propose a novel gene selection method, automatic association feature learning (AAFL), which automatically identifies different gene signatures for different cell subpopulations (cancer subtypes) at the same time. The proposed AAFL method combines the residual network with the low-rank network, which selects genes that are most associated with the corresponding cell subpopulations. Moreover, the differential expression genes are acquired before gene selection to filter the redundant genes. We apply the proposed feature learning method to the real cancer scRNA-seq data sets (melanoma) to identify cancer subtypes and detect gene signatures of identified cancer subtypes. The experimental results demonstrate that the proposed method can automatically identify different gene signatures for identified cancer subtypes. Gene ontology enrichment analysis shows that the identified gene signatures of different subtypes reveal the key biological processes and pathways. These gene signatures are expected to bring important implications for understanding cellular heterogeneity and the complex ecosystem of tumors.

单细胞rna测序(scRNA-seq)技术使得在单个细胞中研究人类癌症成为可能,它探索了肿瘤的细胞异质性和基因型状态。基因特征识别在癌症亚型的精确分类中起着重要作用。然而,大多数现有的基因选择方法只对每个亚型选择相同的信息基因。在这项研究中,我们提出了一种新的基因选择方法,自动关联特征学习(AAFL),它可以同时自动识别不同细胞亚群(癌症亚型)的不同基因特征。提出的AAFL方法将残差网络与低秩网络相结合,选择与相应细胞亚群最相关的基因。在基因选择前先获取差异表达基因,过滤冗余基因。我们将提出的特征学习方法应用于真实的癌症scRNA-seq数据集(黑色素瘤),以识别癌症亚型并检测识别出的癌症亚型的基因特征。实验结果表明,该方法可以自动识别出已识别的癌症亚型的不同基因特征。基因本体富集分析表明,所鉴定的不同亚型的基因特征揭示了关键的生物学过程和途径。这些基因标记有望为理解肿瘤的细胞异质性和复杂的生态系统带来重要的意义。
{"title":"AAFL: automatic association feature learning for gene signature identification of cancer subtypes in single-cell RNA-seq data.","authors":"Meng Huang, Changzhou Long, Jiangtao Ma","doi":"10.1093/bfgp/elac047","DOIUrl":"10.1093/bfgp/elac047","url":null,"abstract":"<p><p>Single-cell RNA-sequencing (scRNA-seq) technologies have enabled the study of human cancers in individual cells, which explores the cellular heterogeneity and the genotypic status of tumors. Gene signature identification plays an important role in the precise classification of cancer subtypes. However, most existing gene selection methods only select the same informative genes for each subtype. In this study, we propose a novel gene selection method, automatic association feature learning (AAFL), which automatically identifies different gene signatures for different cell subpopulations (cancer subtypes) at the same time. The proposed AAFL method combines the residual network with the low-rank network, which selects genes that are most associated with the corresponding cell subpopulations. Moreover, the differential expression genes are acquired before gene selection to filter the redundant genes. We apply the proposed feature learning method to the real cancer scRNA-seq data sets (melanoma) to identify cancer subtypes and detect gene signatures of identified cancer subtypes. The experimental results demonstrate that the proposed method can automatically identify different gene signatures for identified cancer subtypes. Gene ontology enrichment analysis shows that the identified gene signatures of different subtypes reveal the key biological processes and pathways. These gene signatures are expected to bring important implications for understanding cellular heterogeneity and the complex ecosystem of tumors.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"420-427"},"PeriodicalIF":4.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9431025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Machine learning applications on intratumoral heterogeneity in glioblastoma using single-cell RNA sequencing data. 利用单细胞RNA测序数据,机器学习在胶质母细胞瘤内异质性研究中的应用。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2023-11-10 DOI: 10.1093/bfgp/elad002
Harold Brayan Arteaga-Arteaga, Mariana S Candamil-Cortés, Brian Breaux, Pablo Guillen-Rondon, Simon Orozco-Arias, Reinel Tabares-Soto

Artificial intelligence is revolutionizing all fields that affect people's lives and health. One of the most critical applications is in the study of tumors. It is the case of glioblastoma (GBM) that has behaviors that need to be understood to develop effective therapies. Due to advances in single-cell RNA sequencing (scRNA-seq), it is possible to understand the cellular and molecular heterogeneity in the GBM. Given that there are different cell groups in these tumors, there is a need to apply Machine Learning (ML) algorithms. It will allow extracting information to understand how cancer changes and broaden the search for effective treatments. We proposed multiple comparisons of ML algorithms to classify cell groups based on the GBM scRNA-seq data. This broad comparison spectrum can show the scientific-medical community which models can achieve the best performance in this task. In this work are classified the following cell groups: Tumor Core (TC), Tumor Periphery (TP) and Normal Periphery (NP), in binary and multi-class scenarios. This work presents the biomarker candidates found for the models with the best results. The analyses presented here allow us to verify the biomarker candidates to understand the genetic characteristics of GBM, which may be affected by a suitable identification of GBM heterogeneity. This work obtained for the four scenarios covered cross-validation results of $93.03% pm 5.37%$, $97.42% pm 3.94%$, $98.27% pm 1.81%$ and $93.04% pm 6.88%$ for the classification of TP versus TC, TP versus NP, NP versus TP and TC (TPC) and NP versus TP versus TC, respectively.

人工智能正在彻底改变影响人们生活和健康的所有领域。其中最关键的应用是肿瘤研究。胶质母细胞瘤(GBM)的行为需要了解,以开发有效的治疗方法。由于单细胞RNA测序(scRNA-seq)的进步,有可能了解GBM的细胞和分子异质性。考虑到这些肿瘤中有不同的细胞群,有必要应用机器学习(ML)算法。它将允许提取信息来了解癌症是如何变化的,并扩大对有效治疗方法的研究。我们提出了基于GBM scRNA-seq数据的ML算法分类的多个比较。这种广泛的比较范围可以向科学界和医学界展示哪些模型可以在这项任务中实现最佳性能。在这项工作中分为以下细胞组:肿瘤核心(TC),肿瘤外围(TP)和正常外围(NP),在二元和多类的情况下。这项工作提出了最佳结果的生物标志物候选模型。本文提出的分析使我们能够验证候选生物标志物,以了解GBM的遗传特征,这可能受到GBM异质性的适当鉴定的影响。本文在四种场景下得到的交叉验证结果分别为:$ 93.03% pm 5.37%$、$ 97.42% pm 3.94%$、$ 98.27% pm 1.81%$和$ 93.04% pm 6.88%$,分别用于TP与TC、TP与NP、NP与TP和TC (TPC)、NP与TP与TC。
{"title":"Machine learning applications on intratumoral heterogeneity in glioblastoma using single-cell RNA sequencing data.","authors":"Harold Brayan Arteaga-Arteaga, Mariana S Candamil-Cortés, Brian Breaux, Pablo Guillen-Rondon, Simon Orozco-Arias, Reinel Tabares-Soto","doi":"10.1093/bfgp/elad002","DOIUrl":"10.1093/bfgp/elad002","url":null,"abstract":"<p><p>Artificial intelligence is revolutionizing all fields that affect people's lives and health. One of the most critical applications is in the study of tumors. It is the case of glioblastoma (GBM) that has behaviors that need to be understood to develop effective therapies. Due to advances in single-cell RNA sequencing (scRNA-seq), it is possible to understand the cellular and molecular heterogeneity in the GBM. Given that there are different cell groups in these tumors, there is a need to apply Machine Learning (ML) algorithms. It will allow extracting information to understand how cancer changes and broaden the search for effective treatments. We proposed multiple comparisons of ML algorithms to classify cell groups based on the GBM scRNA-seq data. This broad comparison spectrum can show the scientific-medical community which models can achieve the best performance in this task. In this work are classified the following cell groups: Tumor Core (TC), Tumor Periphery (TP) and Normal Periphery (NP), in binary and multi-class scenarios. This work presents the biomarker candidates found for the models with the best results. The analyses presented here allow us to verify the biomarker candidates to understand the genetic characteristics of GBM, which may be affected by a suitable identification of GBM heterogeneity. This work obtained for the four scenarios covered cross-validation results of $93.03% pm 5.37%$, $97.42% pm 3.94%$, $98.27% pm 1.81%$ and $93.04% pm 6.88%$ for the classification of TP versus TC, TP versus NP, NP versus TP and TC (TPC) and NP versus TP versus TC, respectively.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"428-441"},"PeriodicalIF":2.5,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9518963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in Functional Genomics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1