首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
Small groups in multidimensional feature space: two examples of supervised two-group classification from biomedicine 多维特征空间中的小群体:生物医学中监督双群体分类的两个例子
4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-07 DOI: 10.1142/s0219720023500257
Dmitriy Karpenko, Aleksei Bigildeev
{"title":"Small groups in multidimensional feature space: two examples of supervised two-group classification from biomedicine","authors":"Dmitriy Karpenko, Aleksei Bigildeev","doi":"10.1142/s0219720023500257","DOIUrl":"https://doi.org/10.1142/s0219720023500257","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"115 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135541469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNV-FB: A Feature bagging strategy-based approach to detect copy number variants from NGS data CNV-FB:基于特征装袋策略的NGS数据拷贝数变异检测方法
4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-07 DOI: 10.1142/s0219720023500269
Chengyou Li, Shiqiang Fan, Haiyong Zhao, Xiaotong Liu
{"title":"CNV-FB: A Feature bagging strategy-based approach to detect copy number variants from NGS data","authors":"Chengyou Li, Shiqiang Fan, Haiyong Zhao, Xiaotong Liu","doi":"10.1142/s0219720023500269","DOIUrl":"https://doi.org/10.1142/s0219720023500269","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"115 22","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135541468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing omics data by feature combinations based on kernel functions. 基于核函数的组学数据特征组合分析
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-01 Epub Date: 2023-10-18 DOI: 10.1142/S021972002350021X
Chao Li, Tianxiang Wang, Xiaohui Lin

Defining meaningful feature (molecule) combinations can enhance the study of disease diagnosis and prognosis. However, feature combinations are complex and various in biosystems, and the existing methods examine the feature cooperation in a single, fixed pattern for all feature pairs, such as linear combination. To identify the appropriate combination between two features and evaluate feature combination more comprehensively, this paper adopts kernel functions to study feature relationships and proposes a new omics data analysis method KF-[Formula: see text]-TSP. Besides linear combination, KF-[Formula: see text]-TSP also explores the nonlinear combination of features, and allows hybridizing multiple kernel functions to evaluate feature interaction from multiple views. KF-[Formula: see text]-TSP selects [Formula: see text] > 0 top-scoring pairs to build an ensemble classifier. Experimental results show that KF-[Formula: see text]-TSP with multiple kernel functions which evaluates feature combinations from multiple views is better than that with only one kernel function. Meanwhile, KF-[Formula: see text]-TSP performs better than TSP family algorithms and the previous methods based on conversion strategy in most cases. It performs similarly to the popular machine learning methods in omics data analysis, but involves fewer feature pairs. In the procedure of physiological and pathological changes, molecular interactions can be both linear and nonlinear. Hence, KF-[Formula: see text]-TSP, which can measure molecular combination from multiple perspectives, can help to mine information closely related to physiological and pathological changes and study disease mechanism.

定义有意义的特征(分子)组合可以加强对疾病诊断和预后的研究。然而,在生物系统中,特征组合是复杂而多样的,现有的方法以单一的、固定的模式检查所有特征对的特征协作,例如线性组合。为了识别两个特征之间的适当组合并更全面地评估特征组合,本文采用核函数来研究特征关系,并提出了一种新的组学数据分析方法KF-[公式:见正文]-TSP。除了线性组合,KF-[公式:见正文]-TSP还探索了特征的非线性组合,并允许混合多个核函数来从多个视图评估特征交互。KF-[公式:见正文]-TSP选择[公式:看正文]>0个得分最高的对来构建集成分类器。实验结果表明,具有多个核函数的KF-[公式:见正文]-TSP从多个角度评估特征组合,优于仅具有一个核函数。同时,KF-[公式:见正文]-TSP在大多数情况下都优于TSP族算法和以前基于转换策略的方法。它的性能与组学数据分析中流行的机器学习方法类似,但涉及的特征对较少。在生理和病理变化过程中,分子相互作用既可以是线性的,也可以是非线性的。因此,KF-[公式:见正文]-TSP可以从多个角度测量分子组合,有助于挖掘与生理病理变化密切相关的信息,研究疾病机制。
{"title":"Analyzing omics data by feature combinations based on kernel functions.","authors":"Chao Li, Tianxiang Wang, Xiaohui Lin","doi":"10.1142/S021972002350021X","DOIUrl":"10.1142/S021972002350021X","url":null,"abstract":"<p><p>Defining meaningful feature (molecule) combinations can enhance the study of disease diagnosis and prognosis. However, feature combinations are complex and various in biosystems, and the existing methods examine the feature cooperation in a single, fixed pattern for all feature pairs, such as linear combination. To identify the appropriate combination between two features and evaluate feature combination more comprehensively, this paper adopts kernel functions to study feature relationships and proposes a new omics data analysis method KF-[Formula: see text]-TSP. Besides linear combination, KF-[Formula: see text]-TSP also explores the nonlinear combination of features, and allows hybridizing multiple kernel functions to evaluate feature interaction from multiple views. KF-[Formula: see text]-TSP selects [Formula: see text] > 0 top-scoring pairs to build an ensemble classifier. Experimental results show that KF-[Formula: see text]-TSP with multiple kernel functions which evaluates feature combinations from multiple views is better than that with only one kernel function. Meanwhile, KF-[Formula: see text]-TSP performs better than TSP family algorithms and the previous methods based on conversion strategy in most cases. It performs similarly to the popular machine learning methods in omics data analysis, but involves fewer feature pairs. In the procedure of physiological and pathological changes, molecular interactions can be both linear and nonlinear. Hence, KF-[Formula: see text]-TSP, which can measure molecular combination from multiple perspectives, can help to mine information closely related to physiological and pathological changes and study disease mechanism.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2350021"},"PeriodicalIF":1.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41358214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methods for cell-type annotation on scRNA-seq data: A recent overview. scRNA-seq数据的细胞类型注释方法:最新综述。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-01 Epub Date: 2023-09-23 DOI: 10.1142/S0219720023400024
Konstantinos Lazaros, Panagiotis Vlamos, Aristidis G Vrahatis

The evolution of single-cell technology is ongoing, continually generating massive amounts of data that reveal many mysteries surrounding intricate diseases. However, their drawbacks continue to constrain us. Among these, annotating cell types in single-cell gene expressions pose a substantial challenge, despite the myriad of tools at our disposal. The rapid growth in data, resources, and tools has consequently brought about significant alterations in this area over the years. In our study, we spotlight all note-worthy cell type annotation techniques developed over the past four years. We provide an overview of the latest trends in this field, showcasing the most advanced methods in taxonomy. Our research underscores the demand for additional tools that incorporate a biological context and also predicts that the rising trend of graph neural network approaches will likely lead this research field in the coming years.

单细胞技术的发展正在进行中,不断产生大量数据,揭示了复杂疾病的许多奥秘。然而,它们的缺点仍然限制着我们。其中,在单细胞基因表达中注释细胞类型是一个巨大的挑战,尽管我们可以使用无数的工具。多年来,数据、资源和工具的快速增长导致了这一领域的重大变化。在我们的研究中,我们重点介绍了过去四年中开发的所有值得注意的细胞类型注释技术。我们概述了该领域的最新趋势,展示了分类学中最先进的方法。我们的研究强调了对结合生物学背景的额外工具的需求,并预测图神经网络方法的上升趋势可能会在未来几年引领这一研究领域。
{"title":"Methods for cell-type annotation on scRNA-seq data: A recent overview.","authors":"Konstantinos Lazaros, Panagiotis Vlamos, Aristidis G Vrahatis","doi":"10.1142/S0219720023400024","DOIUrl":"10.1142/S0219720023400024","url":null,"abstract":"<p><p>The evolution of single-cell technology is ongoing, continually generating massive amounts of data that reveal many mysteries surrounding intricate diseases. However, their drawbacks continue to constrain us. Among these, annotating cell types in single-cell gene expressions pose a substantial challenge, despite the myriad of tools at our disposal. The rapid growth in data, resources, and tools has consequently brought about significant alterations in this area over the years. In our study, we spotlight all note-worthy cell type annotation techniques developed over the past four years. We provide an overview of the latest trends in this field, showcasing the most advanced methods in taxonomy. Our research underscores the demand for additional tools that incorporate a biological context and also predicts that the rising trend of graph neural network approaches will likely lead this research field in the coming years.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2340002"},"PeriodicalIF":1.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41155989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model. AAindex PPII:用改进的BiGRU TextCNN模型基于氨基酸指数预测聚脯氨酸II型螺旋结构。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-01 Epub Date: 2023-10-28 DOI: 10.1142/S0219720023500221
Jiasheng He, Shun Zhang, Chun Fang

The polyproline-II (PPII) structure domain is crucial in organisms' signal transduction, transcription, cell metabolism, and immune response. It is also a critical structural domain for specific vital disease-associated proteins. Recognizing PPII is essential for understanding protein structure and function. To accurately predict PPII in proteins, we propose a novel method, AAindex-PPII, which only adopts amino acid index to characterize protein sequences and uses a Bidirectional Gated Recurrent Unit (BiGRU)-Improved TextCNN composite deep learning model to predict PPII in proteins. Experimental results show that, when tested on the same datasets, our method outperforms the state-of-the-art BERT-PPII method, achieving an AUC value of 0.845 on the strict data and an AUC value of 0.813 on the non-strict data, which is 0.024 and 0.03 higher than that of the BERT-PPII method. This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.

聚脯氨酸II(PPII)结构域在生物体的信号转导、转录、细胞代谢和免疫反应中至关重要。它也是特定重要疾病相关蛋白的关键结构域。识别PPII对于理解蛋白质结构和功能至关重要。为了准确预测蛋白质中的PPII,我们提出了一种新方法AAindex PPII,该方法仅采用氨基酸指数来表征蛋白质序列,并使用双向门控递归单元(BiGRU)-改进的TextCNN复合深度学习模型来预测蛋白质中PPII。实验结果表明,在相同的数据集上测试时,我们的方法优于最先进的BERT-PPII方法,在严格数据上实现了0.845的AUC值,在非严格数据上获得了0.813的AUC,比BERT-PPII方法高0.024和0.03。这项研究表明,我们提出的方法在不使用预先训练的大型模型或复杂特征(如特定位置的评分矩阵)的情况下,对PPII预测是简单有效的。
{"title":"AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model.","authors":"Jiasheng He, Shun Zhang, Chun Fang","doi":"10.1142/S0219720023500221","DOIUrl":"10.1142/S0219720023500221","url":null,"abstract":"<p><p>The polyproline-II (PPII) structure domain is crucial in organisms' signal transduction, transcription, cell metabolism, and immune response. It is also a critical structural domain for specific vital disease-associated proteins. Recognizing PPII is essential for understanding protein structure and function. To accurately predict PPII in proteins, we propose a novel method, AAindex-PPII, which only adopts amino acid index to characterize protein sequences and uses a Bidirectional Gated Recurrent Unit (BiGRU)-Improved TextCNN composite deep learning model to predict PPII in proteins. Experimental results show that, when tested on the same datasets, our method outperforms the state-of-the-art BERT-PPII method, achieving an AUC value of 0.845 on the strict data and an AUC value of 0.813 on the non-strict data, which is 0.024 and 0.03 higher than that of the BERT-PPII method. This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2350022"},"PeriodicalIF":1.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71414895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm. iAMY RECMFF:利用残基成对能量含量矩阵和特征融合算法识别淀粉桥肽。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-01 Epub Date: 2023-10-27 DOI: 10.1142/S0219720023500233
Zizheng Yu, Zhijian Yin, Hongliang Zou

Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.

据报道,包括亨廷顿舞蹈症、阿尔茨海默病和帕金森病在内的各种疾病都与淀粉样蛋白有关。因此,区分淀粉样蛋白和非淀粉样蛋白或肽至关重要。虽然实验方法通常是首选的,但它们既昂贵又耗时。在这项研究中,我们开发了一个名为iAMY RECMFF的机器学习框架,用于区分淀粉桥肽和非淀粉桥肽。在我们的模型中,我们首先使用残基成对能量含量矩阵编码肽序列。然后,我们利用Pearson的相关系数和距离相关性从该矩阵中提取有用的信息。此外,我们还采用了一种改进的相似性网络融合算法来整合不同角度的特征。采用Fisher方法来选择最优特征子集。最后,将所选择的特征输入到用于鉴定淀粉桥肽的支持向量机中。实验结果表明,与现有的预测因子相比,我们提出的方法显著提高了淀粉桥肽的鉴定。这表明我们的方法可以作为鉴定淀粉桥肽的有力工具。为了便于学术使用,当前研究中使用的数据集和代码可访问https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.
{"title":"iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm.","authors":"Zizheng Yu, Zhijian Yin, Hongliang Zou","doi":"10.1142/S0219720023500233","DOIUrl":"10.1142/S0219720023500233","url":null,"abstract":"<p><p>Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2350023"},"PeriodicalIF":1.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71414897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection. CBDT-Oglyc:使用基于ChiMIC的平衡决策表和特征选择预测O-糖基化位点。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-01 Epub Date: 2023-10-28 DOI: 10.1142/S0219720023500245
Ying Zeng, Zheming Yuan, Yuan Chen, Ying Hu

O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.

O-糖基化在各种生物过程中起着重要作用。了解Oglyc机制的关键是识别相应的糖基化位点。特征选择和分类器设计这两个关键步骤极大地影响了预测Oglyc位点的计算方法的准确性。基于一种有效的特征选择算法和一种能够处理不平衡数据集的分类器,提出了一种新的计算方法——基于ChiMIC的平衡决策表O-糖基化(CBDT-Oglych)。提出了基于ChiMIC的O-糖基化平衡决策表(CBDT-Oglyc)来预测蛋白质中的Oglyc位点。通过结合氨基酸组成(AAC)、[公式:见正文]的无向组成-间隔氨基酸对(无向CKSAAP)和伪位置特异性评分矩阵(PsePSSM)进行序列表征。采用Chi-MIC共享算法进行特征选择,简化了模型,提高了预测精度。对于不平衡分类,设计了一种基于局部卡方检验的回溯方法,然后结合成本敏感学习,构造了一种新的分类器——基于ChiMIC的平衡决策表(CBDT)。基于1:49(正:负)训练集,CBDT分类器实现了比传统分类器更好的预测性能。此外,对单独的人和小鼠糖蛋白的独立测试结果表明,CBDT Oglyc在全局准确性方面优于以前的方法。CBDT-Oglyc在预测Oglyc位点方面显示出巨大的前景,有望促进蛋白质糖基化的进一步实验研究。
{"title":"CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection.","authors":"Ying Zeng, Zheming Yuan, Yuan Chen, Ying Hu","doi":"10.1142/S0219720023500245","DOIUrl":"10.1142/S0219720023500245","url":null,"abstract":"<p><p>O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2350024"},"PeriodicalIF":1.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71414896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepRT: Predicting compounds presence in pathway modules and classifying into module classes using deep neural networks based on molecular properties. DeepRT:使用基于分子特性的深度神经网络预测化合物在通路模块中的存在并分类为模块类。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-01 Epub Date: 2023-08-24 DOI: 10.1142/S0219720023500178
Hayat Ali Shah, Juan Liu, Zhihui Yang, Feng Yang, Qiang Zhang, Jing Feng

Metabolic pathways play a crucial role in understanding the biochemistry of organisms. In metabolic pathways, modules refer to clusters of interconnected reactions or sub-networks representing specific functional units or biological processes within the overall pathway. In pathway modules, compounds are major elements and refer to the various molecules that participate in the biochemical reactions within the pathway modules. These molecules can include substrates, intermediates and final products. Determining the presence relation of compounds and pathway modules is essential for synthesizing new molecules and predicting hidden reactions. To date, several computational methods have been proposed to address this problem. However, all methods only predict the metabolic pathways and their types, not the pathway modules. To address this issue, we proposed a novel deep learning model, DeepRT that integrates message passing neural networks (MPNNs) and transformer encoder. This combination allows DeepRT to effectively extract global and local structure information from the molecular graph. The model is designed to perform two tasks: first, determining the present relation of the compound with the pathway module, and second, predicting the relation of query compound and module classes. The proposed DeepRT model evaluated on a dataset comprising compounds and pathway modules, and it outperforms existing approaches.

代谢途径在理解生物体的生物化学方面起着至关重要的作用。在代谢途径中,模块是指代表整个途径中特定功能单元或生物过程的相互连接的反应或子网络簇。在通路模块中,化合物是主要元素,是指参与通路模块内生化反应的各种分子。这些分子可以包括底物、中间体和最终产物。确定化合物和途径模块的存在关系对于合成新分子和预测隐藏反应至关重要。到目前为止,已经提出了几种计算方法来解决这个问题。然而,所有方法都只预测代谢途径及其类型,而不是途径模块。为了解决这个问题,我们提出了一种新的深度学习模型DeepRT,它集成了消息传递神经网络(MPNN)和变换器编码器。这种组合使DeepRT能够有效地从分子图中提取全局和局部结构信息。该模型被设计用于执行两项任务:第一,确定化合物与路径模块的当前关系,第二,预测查询化合物与模块类的关系。所提出的DeepRT模型在包括化合物和通路模块的数据集上进行了评估,它优于现有方法。
{"title":"DeepRT: Predicting compounds presence in pathway modules and classifying into module classes using deep neural networks based on molecular properties.","authors":"Hayat Ali Shah,&nbsp;Juan Liu,&nbsp;Zhihui Yang,&nbsp;Feng Yang,&nbsp;Qiang Zhang,&nbsp;Jing Feng","doi":"10.1142/S0219720023500178","DOIUrl":"10.1142/S0219720023500178","url":null,"abstract":"<p><p>Metabolic pathways play a crucial role in understanding the biochemistry of organisms. In metabolic pathways, modules refer to clusters of interconnected reactions or sub-networks representing specific functional units or biological processes within the overall pathway. In pathway modules, compounds are major elements and refer to the various molecules that participate in the biochemical reactions within the pathway modules. These molecules can include substrates, intermediates and final products. Determining the presence relation of compounds and pathway modules is essential for synthesizing new molecules and predicting hidden reactions. To date, several computational methods have been proposed to address this problem. However, all methods only predict the metabolic pathways and their types, not the pathway modules. To address this issue, we proposed a novel deep learning model, DeepRT that integrates message passing neural networks (MPNNs) and transformer encoder. This combination allows DeepRT to effectively extract global and local structure information from the molecular graph. The model is designed to perform two tasks: first, determining the present relation of the compound with the pathway module, and second, predicting the relation of query compound and module classes. The proposed DeepRT model evaluated on a dataset comprising compounds and pathway modules, and it outperforms existing approaches.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 4","pages":"2350017"},"PeriodicalIF":1.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10306371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-omics data analysis reveals the biological implications of alternative splicing events in lung adenocarcinoma. 多组学数据分析揭示了肺腺癌中选择性剪接事件的生物学意义。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-01 Epub Date: 2023-09-08 DOI: 10.1142/S0219720023500208
Fuyan Hu, Bifeng Chen, Qing Wang, Zhiyuan Yang, Man Chu

Cancer is characterized by the dysregulation of alternative splicing (AS). However, the comprehensive regulatory mechanisms of AS in lung adenocarcinoma (LUAD) are poorly understood. Here, we displayed the AS landscape in LUAD based on the integrated analyses of LUAD's multi-omics data. We identified 13,995 AS events in 6309 genes as differentially expressed alternative splicing events (DEASEs) mainly covering protein-coding genes. These DEASEs were strongly linked to "cancer hallmarks", such as apoptosis, DNA repair, cell cycle, cell proliferation, angiogenesis, immune response, generation of precursor metabolites and energy, p53 signaling pathway and PI3K-AKT signaling pathway. We further built a regulatory network connecting splicing factors (SFs) and DEASEs. In addition, RNA-binding protein (RBP) mutations that can affect DEASEs were investigated to find some potential cancer drivers. Further association analysis demonstrated that DNA methylation levels were highly correlated with DEASEs. In summary, our results can bring new insight into understanding the mechanism of AS and provide novel biomarkers for personalized medicine of LUAD.

癌症的特点是选择性剪接(AS)失调。然而,AS在肺腺癌(LUAD)中的综合调控机制尚不清楚。在这里,我们基于对LUAD的多组学数据的综合分析,展示了LUAD中的AS景观。我们在6309个基因中鉴定出13995个AS事件为主要覆盖蛋白质编码基因的差异表达选择性剪接事件(DEASEs)。这些DEASE与“癌症特征”密切相关,如凋亡、DNA修复、细胞周期、细胞增殖、血管生成、免疫反应、前体代谢产物和能量的产生、p53信号通路和PI3K-AKT信号通路。我们进一步建立了一个连接剪接因子(SF)和DEASE的调控网络。此外,研究了可影响DEASE的RNA结合蛋白(RBP)突变,以寻找一些潜在的癌症驱动因素。进一步的关联分析表明,DNA甲基化水平与DEASE高度相关。总之,我们的研究结果可以为理解AS的机制带来新的见解,并为LUAD的个性化医学提供新的生物标志物。
{"title":"Multi-omics data analysis reveals the biological implications of alternative splicing events in lung adenocarcinoma.","authors":"Fuyan Hu,&nbsp;Bifeng Chen,&nbsp;Qing Wang,&nbsp;Zhiyuan Yang,&nbsp;Man Chu","doi":"10.1142/S0219720023500208","DOIUrl":"10.1142/S0219720023500208","url":null,"abstract":"<p><p>Cancer is characterized by the dysregulation of alternative splicing (AS). However, the comprehensive regulatory mechanisms of AS in lung adenocarcinoma (LUAD) are poorly understood. Here, we displayed the AS landscape in LUAD based on the integrated analyses of LUAD's multi-omics data. We identified 13,995 AS events in 6309 genes as differentially expressed alternative splicing events (DEASEs) mainly covering protein-coding genes. These DEASEs were strongly linked to \"cancer hallmarks\", such as apoptosis, DNA repair, cell cycle, cell proliferation, angiogenesis, immune response, generation of precursor metabolites and energy, p53 signaling pathway and PI3K-AKT signaling pathway. We further built a regulatory network connecting splicing factors (SFs) and DEASEs. In addition, RNA-binding protein (RBP) mutations that can affect DEASEs were investigated to find some potential cancer drivers. Further association analysis demonstrated that DNA methylation levels were highly correlated with DEASEs. In summary, our results can bring new insight into understanding the mechanism of AS and provide novel biomarkers for personalized medicine of LUAD.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 4","pages":"2350020"},"PeriodicalIF":1.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10307695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A model-based clustering algorithm with covariates adjustment and its application to lung cancer stratification. 基于模型的协变量调整聚类算法及其在癌症分层中的应用。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-01 Epub Date: 2023-09-08 DOI: 10.1142/S0219720023500191
Carlos E M Relvas, Asuka Nakata, Guoan Chen, David G Beer, Noriko Gotoh, Andre Fujita
Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical data is the presence of covariates, which may mask the obtained clustering structure. For example, suppose we are interested in clustering a set of individuals into controls and cancer patients. A clustering algorithm could group subjects into young and elderly in this case. It may happen because the age at diagnosis is associated with cancer. Thus, we developed CEM-Co, a model-based clustering algorithm that removes/minimizes undesirable covariates' effects during the clustering process. We applied CEM-Co on a gene expression dataset composed of 129 stage I non-small cell lung cancer patients. As a result, we identified a subgroup with a poorer prognosis, while standard clustering algorithms failed.
通常,聚类过程是几个数据分析的第一步。聚类可以识别我们以前没有注意到的模式,并有助于提出新的假设。然而,在分析经验数据时,一个挑战是协变量的存在,这可能会掩盖所获得的聚类结构。例如,假设我们有兴趣将一组个体分为对照组和癌症患者。在这种情况下,聚类算法可以将受试者分为年轻人和老年人。这可能是因为诊断时的年龄与癌症有关。因此,我们开发了CEM-Co,这是一种基于模型的聚类算法,可以在聚类过程中消除/最小化不期望的协变量的影响。我们将CEM-Co应用于由129名I期癌症非小细胞肺癌患者组成的基因表达数据集。因此,我们确定了一个预后较差的亚组,而标准聚类算法失败了。
{"title":"A model-based clustering algorithm with covariates adjustment and its application to lung cancer stratification.","authors":"Carlos E M Relvas,&nbsp;Asuka Nakata,&nbsp;Guoan Chen,&nbsp;David G Beer,&nbsp;Noriko Gotoh,&nbsp;Andre Fujita","doi":"10.1142/S0219720023500191","DOIUrl":"10.1142/S0219720023500191","url":null,"abstract":"Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical data is the presence of covariates, which may mask the obtained clustering structure. For example, suppose we are interested in clustering a set of individuals into controls and cancer patients. A clustering algorithm could group subjects into young and elderly in this case. It may happen because the age at diagnosis is associated with cancer. Thus, we developed CEM-Co, a model-based clustering algorithm that removes/minimizes undesirable covariates' effects during the clustering process. We applied CEM-Co on a gene expression dataset composed of 129 stage I non-small cell lung cancer patients. As a result, we identified a subgroup with a poorer prognosis, while standard clustering algorithms failed.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"21 4","pages":"2350019"},"PeriodicalIF":1.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10307699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1