首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
RPfam: A refiner towards curated-like multiple sequence alignments of the Pfam protein families RPfam:Pfam蛋白家族的精细化多序列比对
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-14 DOI: 10.1142/S0219720022400029
Qingting Wei, Hong Zou, Cuncong Zhong, Jianfeng Xu
High-quality multiple sequence alignments can provide insights into the architecture and function of protein families. The existing MSA tools often generate results inconsistent with biological distribution of conserved regions because of positioning amino acid residues and gaps only by symbols. We propose RPfam, a refiner towards curated-like MSAs for modeling the protein families in the Pfam database. RPfam refines the automatic alignments via scoring alignments based on the PFASUM matrix, restricting realignments within badly aligned blocks, optimizing the block scores by dynamic programming, and running refinements iteratively using the Simulated Annealing algorithm. Experiments show RPfam effectively refined the alignments produced by the MSA tools ClustalO and Muscle with reference to the curated seed alignments of the Pfam protein families. Especially RPfam improved the quality of the ClustalO alignments by 4.4% and the Muscle alignments by 2.8% on the gp32 DNA binding protein-like family. Supplementary Table is available at http://www.worldscinet.com/jbcb/.
高质量的多序列比对可以深入了解蛋白质家族的结构和功能。由于仅通过符号定位氨基酸残基和间隙,现有的MSA工具经常产生与保守区的生物学分布不一致的结果。我们提出了RPfam,这是一种对Pfam数据库中的蛋白质家族进行建模的策划类MSAs的细化器。RPfam通过基于PFASUM矩阵的评分比对、限制对齐不好的块内的重新对齐、通过动态编程优化块分数以及使用模拟退火算法迭代运行细化来细化自动对齐。实验表明,RPfam参考Pfam蛋白家族的精选种子比对,有效地改进了MSA工具ClustalO和Muscle产生的比对。特别是RPfam使gp32 DNA结合蛋白样家族的ClustalO比对质量提高了4.4%,使肌肉比对质量提高2.8%。补充表格可在http://www.worldscinet.com/jbcb/.
{"title":"RPfam: A refiner towards curated-like multiple sequence alignments of the Pfam protein families","authors":"Qingting Wei, Hong Zou, Cuncong Zhong, Jianfeng Xu","doi":"10.1142/S0219720022400029","DOIUrl":"https://doi.org/10.1142/S0219720022400029","url":null,"abstract":"High-quality multiple sequence alignments can provide insights into the architecture and function of protein families. The existing MSA tools often generate results inconsistent with biological distribution of conserved regions because of positioning amino acid residues and gaps only by symbols. We propose RPfam, a refiner towards curated-like MSAs for modeling the protein families in the Pfam database. RPfam refines the automatic alignments via scoring alignments based on the PFASUM matrix, restricting realignments within badly aligned blocks, optimizing the block scores by dynamic programming, and running refinements iteratively using the Simulated Annealing algorithm. Experiments show RPfam effectively refined the alignments produced by the MSA tools ClustalO and Muscle with reference to the curated seed alignments of the Pfam protein families. Especially RPfam improved the quality of the ClustalO alignments by 4.4% and the Muscle alignments by 2.8% on the gp32 DNA binding protein-like family. Supplementary Table is available at http://www.worldscinet.com/jbcb/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2240002"},"PeriodicalIF":1.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48191874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis to determine the effect of mutations on binding to small chemical molecules 分析以确定突变对与小化学分子结合的影响
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-14 DOI: 10.1142/S0219720022400030
T. Koshlan, K. Kulikov
In this paper, the authors present and describe, in detail, an original software-implemented numerical methodology used to determine the effect of mutations on binding to small chemical molecules, on the example of gefitinib, AMPPNP, CO-1686, ASP8273, erlotinib binding with EGFR protein, and imatinib binding with PPARgamma. Furthermore, the developed numerical approach makes it possible to determine the stability of a molecular complex, which consists of a protein and a small chemical molecule. The description of the software package that implements the presented algorithm is given in the website: https://binomlabs.com/.
在本文中,作者详细介绍了一种用于确定突变对与小化学分子结合的影响的原始软件实现的数值方法,例如吉非替尼、AMPPNP、CO-1686、ASP8273、埃洛替尼与EGFR蛋白结合以及伊马替尼与PPARγ结合。此外,所开发的数值方法使确定由蛋白质和小化学分子组成的分子复合物的稳定性成为可能。网站中给出了实现所提出算法的软件包的描述:https://binomlabs.com/.
{"title":"Analysis to determine the effect of mutations on binding to small chemical molecules","authors":"T. Koshlan, K. Kulikov","doi":"10.1142/S0219720022400030","DOIUrl":"https://doi.org/10.1142/S0219720022400030","url":null,"abstract":"In this paper, the authors present and describe, in detail, an original software-implemented numerical methodology used to determine the effect of mutations on binding to small chemical molecules, on the example of gefitinib, AMPPNP, CO-1686, ASP8273, erlotinib binding with EGFR protein, and imatinib binding with PPARgamma. Furthermore, the developed numerical approach makes it possible to determine the stability of a molecular complex, which consists of a protein and a small chemical molecule. The description of the software package that implements the presented algorithm is given in the website: https://binomlabs.com/.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2240003"},"PeriodicalIF":1.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47358607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach. logistic矩阵分解法预测临床前癌细胞的临床药物反应。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-01 Epub Date: 2021-12-17 DOI: 10.1142/S0219720021500359
Akram Emdadi, Changiz Eslahchi

Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (IC 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model's performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF's logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.

利用大量抗癌药物的癌细胞系药物反应值预测肿瘤药物反应是个体化医疗的重大挑战。通过从临床前模型获得的数据预测患者对药物的反应,由于对细胞系和药物的不同知识的可用性,使得预测患者对药物的反应变得更加容易。本文提出了TCLMF方法,这是一种基于logistic矩阵分解方法,在临床前样本上训练的预测肿瘤样本药物反应的预测模型。TCLMF模型是基于来自癌细胞系的基因表达谱、组织类型信息、药物化学结构和药物敏感性(IC 50)数据设计的。我们使用来自癌症药物敏感性基因组数据集(GDSC)的临床前数据来训练所提出的药物反应模型,然后我们使用该模型来预测来自癌症基因组图谱(TCGA)数据集的样本的药物敏感性。TCLMF方法侧重于识别细胞系和药物的成功特征,以计算肿瘤样本对药物敏感的概率。在本研究中,使用肿瘤样本和细胞系之间的相似性描述来计算每个肿瘤样本的最近细胞系邻居。对新肿瘤的药物反应,然后通过平均从其邻近细胞系获得的低秩特征来计算。我们将TCLMF模型的结果与之前提出的方法的结果进行比较,使用两个数据库和两种方法来测试模型的性能。在第一种方法中,研究了之前方法中考虑的12种已知临床药物反应足够的药物。对于12种药物中的7种,TCLMF可以显著区分对这些药物耐药的患者和对这些药物敏感的患者。在第二种方法中使用阈值将这些方法转换为分类模型,并对结果进行比较。结果表明,TCLMF方法在其他算法的结果之间提供了准确的预测。最后,利用TCLMF的logistic矩阵分解过程得到的潜伏载体,对肿瘤组织类型进行准确分类。这些发现表明,TCLMF方法产生了有效的肿瘤样本潜伏载体。TCLMF方法的源代码可在https://github.com/emdadi/TCLMF中获得。
{"title":"Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach.","authors":"Akram Emdadi,&nbsp;Changiz Eslahchi","doi":"10.1142/S0219720021500359","DOIUrl":"https://doi.org/10.1142/S0219720021500359","url":null,"abstract":"<p><p>Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (<i>IC</i> 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model's performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF's logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2150035"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39614910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimized splitting of mixed-species RNA sequencing data. 混合物种 RNA 测序数据的优化分割。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-01 Epub Date: 2022-01-06 DOI: 10.1142/S0219720022500019
Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart

Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.

事实证明,利用异种移植或共培养系统(通常是人鼠混合细胞)进行基因表达研究,对于揭示发育过程中或疾病模型中的细胞动态非常有价值。然而,物种间 mRNA 序列的相似性给转录本的精确定量带来了挑战。为了确定分析混合物种 RNA 测序数据的最佳策略,我们评估了依赖配准和不依赖配准的方法。将读数与集合参考索引进行比对是有效的,特别是如果使用最佳比对将测序读数按物种分类,然后与单个基因组重新比对,这样就能在一系列物种比例中产生[公式:见正文]准确性。独立于配准的方法,如卷积神经网络,提取两个物种序列的保守模式,对 RNA 测序读数进行分类的准确率超过 85%。重要的是,这两种方法在人类和小鼠读数比例不同的情况下表现良好。虽然非配准策略成功地按物种划分了读数,但事实证明,先混合基因组配准再优化分离读数的传统方法更成功,错误率更低。
{"title":"Optimized splitting of mixed-species RNA sequencing data.","authors":"Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart","doi":"10.1142/S0219720022500019","DOIUrl":"10.1142/S0219720022500019","url":null,"abstract":"<p><p>Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250001"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9081140/pdf/nihms-1770823.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39792860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN. 基于LSTM网络和CNN混合结构的蛋白质琥珀酰化位点预测方法。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-01 Epub Date: 2022-02-21 DOI: 10.1142/S0219720022500032
Die Zhang, Shunfang Wang

The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.

蛋白质的琥珀酰化修饰参与多种细胞过程的调节。鉴定具有精确位点的修饰底物是理解琥珀酰化分子机制和调控的基础。在这项工作中,我们根据它们在琥珀酰化位点预测问题上的表现,选择了五个较好的特征码:CKSAAP、ACF、BLOSUM62、aindex和one-hot。然后,利用LSTM网络和CNN构建LSTM-CNN、CNN-LSTM、LSTM和CNN四个模型。将选择的五个特征分别输入到这四个模型中进行训练,以比较四个模型。根据各模型的性能,从中选择最优模型,构建由5个子模块组成的混合模型deepsuc,用于异构信息集成。在10倍交叉验证下,混合模型DeepSucc准确率为86.26%,特异性为84.94%,灵敏度为87.57%,AUC为0.9406,MCC为0.7254。与使用独立测试集的其他预测工具相比,DeepSucc在灵敏度和MCC方面优于它们。数据集和源代码可以在https://github.com/1835174863zd/DeepSucc上访问。
{"title":"A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN.","authors":"Die Zhang,&nbsp;Shunfang Wang","doi":"10.1142/S0219720022500032","DOIUrl":"https://doi.org/10.1142/S0219720022500032","url":null,"abstract":"<p><p>The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250003"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Tensor decomposition based on the potential low-rank and p-shrinkage generalized threshold algorithm for analyzing cancer multiomics data. 基于潜在低秩p缩广义阈值算法的张量分解癌症多组学数据分析。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-01 Epub Date: 2022-02-21 DOI: 10.1142/S0219720022500020
Hang-Jin Yang, Yu-Xia Lei, Juan Wang, Xiang-Zhen Kong, Jin-Xing Liu, Ying-Lian Gao

Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.

张量鲁棒主成分分析(TRPCA)在基因组学数据分析中取得了可喜的成果。然而,现有张量奇异值分解([公式:见文]-SVD)框架下的TRPCA模型未能充分提取数据潜在的低秩结构,导致恢复分量次优。同时,基于[公式:见文]-SVD定义的张量核范数(TNN)使用相同的标准处理各种奇异值。TNN忽略了奇异值的差异,导致不能很好地保存主要信息。为了保留低秩信息中的异构结构,我们提出了一种新的TNN,并将其扩展到TRPCA模型中。潜在的低秩空间可能包含重要信息。我们从核心张量中学习低秩结构信息。奇异值空间包含了基因与癌症之间的关联信息。使用[公式:见文本]-收缩广义阈值函数来保持较大奇异值的低秩性质。采用乘法器(ADMM)算法的交替方向法求解优化问题。对TCGA数据集进行了聚类和特征选择实验。实验结果表明,该模型比现有的张量分解方法更有前景。
{"title":"Tensor decomposition based on the potential low-rank and <i>p</i>-shrinkage generalized threshold algorithm for analyzing cancer multiomics data.","authors":"Hang-Jin Yang,&nbsp;Yu-Xia Lei,&nbsp;Juan Wang,&nbsp;Xiang-Zhen Kong,&nbsp;Jin-Xing Liu,&nbsp;Ying-Lian Gao","doi":"10.1142/S0219720022500020","DOIUrl":"https://doi.org/10.1142/S0219720022500020","url":null,"abstract":"<p><p>Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250002"},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA modification writers influence tumor microenvironment in gastric cancer and prospects of targeted drug therapy RNA修饰对胃癌肿瘤微环境的影响及靶向药物治疗前景
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-03-14 DOI: 10.1142/S0219720022500044
P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang
Background: RNA adenosine modifications are crucial for regulating RNA levels. N6-methyladenosine (m6A), N1-methyladenosine (m1A), adenosine-to-inosine RNA editing, and alternative polyadenylation (APA) are four major RNA modification types. Methods: We evaluated the altered mRNA expression profiles of 27 RNA modification enzymes and compared the differences in tumor microenvironment (TME) and clinical prognosis between two RNA modification patterns using unsupervised clustering. Then, we constructed a scoring system, WM_score, and quantified the RNA modifications in patients of gastric cancer (GC), associating WM_score with TME, clinical outcomes, and effectiveness of targeted therapies. Results: RNA adenosine modifications strongly correlated with TME and could predict the degree of TME cell infiltration, genetic variation, and clinical prognosis. Two modification patterns were identified according to high and low WM_scores. Tumors in the WM_score-high subgroup were closely linked with survival advantage, CD4[Formula: see text] T-cell infiltration, high tumor mutation burden, and cell cycle signaling pathways, whereas those in the WM_score-low subgroup showed strong infiltration of inflammatory cells and poor survival. Regarding the immunotherapy response, a high WM_score showed a significant correlation with PD-L1 expression, predicting the effect of PD-L1 blockade therapy. Conclusion: The WM_scoring system could facilitate scoring and prediction of GC prognosis.
背景:RNA腺苷修饰对调节RNA水平至关重要。N6-甲基腺苷(m6A)、N1甲基腺苷(m1A)、腺苷到肌苷的RNA编辑和选择性多腺苷酸化(APA)是四种主要的RNA修饰类型。方法:我们评估了27种RNA修饰酶的mRNA表达谱,并使用无监督聚类法比较了两种RNA修饰模式在肿瘤微环境(TME)和临床预后方面的差异。然后,我们构建了一个评分系统WM_score,并量化了癌症(GC)患者的RNA修饰,将WM_score与TME、临床结果和靶向治疗的有效性相关联。结果:RNA腺苷修饰与TME密切相关,可预测TME细胞浸润程度、遗传变异和临床预后。根据高和低WM_scores确定了两种改性模式。WM_score-high亚组中的肿瘤与生存优势、CD4[公式:见正文]T细胞浸润、高肿瘤突变负荷和细胞周期信号通路密切相关,而WM_score-low亚组的肿瘤显示炎症细胞浸润强烈,生存率低。关于免疫治疗反应,高WM_score与PD-L1表达显著相关,预测PD-L1阻断治疗的效果。结论:WM_ scoring系统有助于GC预后的评分和预测。
{"title":"RNA modification writers influence tumor microenvironment in gastric cancer and prospects of targeted drug therapy","authors":"P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang","doi":"10.1142/S0219720022500044","DOIUrl":"https://doi.org/10.1142/S0219720022500044","url":null,"abstract":"Background: RNA adenosine modifications are crucial for regulating RNA levels. N6-methyladenosine (m6A), N1-methyladenosine (m1A), adenosine-to-inosine RNA editing, and alternative polyadenylation (APA) are four major RNA modification types. Methods: We evaluated the altered mRNA expression profiles of 27 RNA modification enzymes and compared the differences in tumor microenvironment (TME) and clinical prognosis between two RNA modification patterns using unsupervised clustering. Then, we constructed a scoring system, WM_score, and quantified the RNA modifications in patients of gastric cancer (GC), associating WM_score with TME, clinical outcomes, and effectiveness of targeted therapies. Results: RNA adenosine modifications strongly correlated with TME and could predict the degree of TME cell infiltration, genetic variation, and clinical prognosis. Two modification patterns were identified according to high and low WM_scores. Tumors in the WM_score-high subgroup were closely linked with survival advantage, CD4[Formula: see text] T-cell infiltration, high tumor mutation burden, and cell cycle signaling pathways, whereas those in the WM_score-low subgroup showed strong infiltration of inflammatory cells and poor survival. Regarding the immunotherapy response, a high WM_score showed a significant correlation with PD-L1 expression, predicting the effect of PD-L1 blockade therapy. Conclusion: The WM_scoring system could facilitate scoring and prediction of GC prognosis.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250004"},"PeriodicalIF":1.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48168063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction 一种基于序列的两层预测器,用于通过增强特征提取识别增强子及其强度
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-03-09 DOI: 10.1142/S0219720022500056
Santhosh Amilpur, Raju Bhukya
Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.
增强子是短的调控DNA片段,与称为激活子的蛋白质结合。它们是自由结合的远距离元件,在控制基因表达中起着至关重要的作用。由于增强剂的动态性,确定增强剂及其强度具有挑战性。虽然存在一些机器学习方法来加速识别过程,但它们的预测精度和效率还需要更多的提高。为此,我们提出了一种基于增强特征提取策略的两层预测模型,该模型将改进的位置特异性氨基酸倾向(PSTKNC)方法与增强的核酸组成(ENAC)和k间隔核酸对组成(CKSNAP)相结合。将所有三种特征提取方法的特征集连接起来,然后通过简单的人工神经网络(ANN)准确识别第一层的增强子和第二层的增强子的强度。在基准染色质9细胞系数据集上进行了实验。采用10倍交叉验证法对模型的性能进行评价。结果表明,该模型对增强子的预测精度为94.50%,准确度为0.8903,马修相关系数(MCC)为0.8903,与现有方法相比,具有较好的独立检验性能。
{"title":"A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction","authors":"Santhosh Amilpur, Raju Bhukya","doi":"10.1142/S0219720022500056","DOIUrl":"https://doi.org/10.1142/S0219720022500056","url":null,"abstract":"Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"1 1","pages":"2250005"},"PeriodicalIF":1.0,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41464283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identification of cancer-related module in protein-protein interaction network based on gene prioritization. 基于基因优先级的蛋白质相互作用网络中癌症相关模块的鉴定。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-02-01 Epub Date: 2021-12-03 DOI: 10.1142/S0219720021500311
Jingli Wu, Qi Zhang, Gaoshi Li

With the rapid development of deep sequencing technologies, a large amount of high-throughput data has been available for studying the carcinogenic mechanism at the molecular level. It has been widely accepted that the development and progression of cancer are regulated by modules/pathways rather than individual genes. The investigation of identifying cancer-related active modules has received an extensive attention. In this paper, we put forward an identification method ModFinder by integrating both biological networks and gene expression profiles. More concretely, a gene scoring function is devised by using the regression model with [Formula: see text]-step random walk kernel, and the genes are ranked according to both of their active scores and degrees in the PPI network. Then a greedy algorithm NSEA is introduced to find an active module with high score and strong connectivity. Experiments were performed on both simulated data and real biological one, i.e. breast cancer and cervical cancer. Compared with the previous methods SigMod, LEAN and RegMod, ModFinder shows competitive performance. It can successfully identify a well-connected module that contains a large proportion of cancer-related genes, including some well-known oncogenes or tumor suppressors enriched in cancer-related pathways.

随着深度测序技术的快速发展,为在分子水平上研究致癌机制提供了大量高通量数据。人们普遍认为,癌症的发生和发展是由模块/途径而不是单个基因调控的。鉴别癌症相关活性模块的研究受到了广泛的关注。本文提出了一种结合生物网络和基因表达谱的识别方法ModFinder。具体而言,采用[公式:见文]步随机漫步核回归模型设计基因评分函数,并根据基因在PPI网络中的活跃分数和程度对基因进行排序。然后引入贪心算法NSEA来寻找分数高、连通性强的有源模块。实验采用模拟数据和真实的生物学数据,即乳腺癌和宫颈癌。与以往的SigMod、LEAN和RegMod方法相比,ModFinder具有较强的竞争力。它可以成功地识别出包含大量癌症相关基因的连接良好的模块,包括一些众所周知的癌基因或富集于癌症相关途径的肿瘤抑制因子。
{"title":"Identification of cancer-related module in protein-protein interaction network based on gene prioritization.","authors":"Jingli Wu,&nbsp;Qi Zhang,&nbsp;Gaoshi Li","doi":"10.1142/S0219720021500311","DOIUrl":"https://doi.org/10.1142/S0219720021500311","url":null,"abstract":"<p><p>With the rapid development of deep sequencing technologies, a large amount of high-throughput data has been available for studying the carcinogenic mechanism at the molecular level. It has been widely accepted that the development and progression of cancer are regulated by modules/pathways rather than individual genes. The investigation of identifying cancer-related active modules has received an extensive attention. In this paper, we put forward an identification method ModFinder by integrating both biological networks and gene expression profiles. More concretely, a gene scoring function is devised by using the regression model with [Formula: see text]-step random walk kernel, and the genes are ranked according to both of their active scores and degrees in the PPI network. Then a greedy algorithm NSEA is introduced to find an active module with high score and strong connectivity. Experiments were performed on both simulated data and real biological one, i.e. breast cancer and cervical cancer. Compared with the previous methods SigMod, LEAN and RegMod, ModFinder shows competitive performance. It can successfully identify a well-connected module that contains a large proportion of cancer-related genes, including some well-known oncogenes or tumor suppressors enriched in cancer-related pathways.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 1","pages":"2150031"},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39956506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new Bayesian approach for QTL mapping of family data. 一种新的贝叶斯方法用于家族数据的QTL映射。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-02-01 Epub Date: 2021-11-19 DOI: 10.1142/S021972002150030X
Daiane Aparecida Zuanetti, Luis Aparecido Milan

In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs' effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents' genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs' position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.

本文提出了一种新的贝叶斯方法用于家族数据的QTL映射。主要目的是模拟表型作为qtl效应的函数。该模型考虑了详细的熟悉依赖性,不依赖于随机效应。它结合了亲本基因型的孟德尔遗传概率和侧翼标记与qtl的相关性。与仅使用孟德尔分离或仅使用标记和qtl之间的相关性来估计传播概率的模型相比,这是一个进步。我们使用贝叶斯方法来估计qtl的数量、位置以及加性效应和显性效应。我们使用模拟和GAW17数据集比较了该方法与方差分量和LASSO模型的性能。在测试条件下,该方法在qtl数量估计、qtl位置准确性和qtl效应估计等方面均优于其他方法。将所提出的方法应用于数据集的结果超出了我们的预期。
{"title":"A new Bayesian approach for QTL mapping of family data.","authors":"Daiane Aparecida Zuanetti,&nbsp;Luis Aparecido Milan","doi":"10.1142/S021972002150030X","DOIUrl":"https://doi.org/10.1142/S021972002150030X","url":null,"abstract":"<p><p>In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs' effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents' genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs' position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 1","pages":"2150030"},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39645904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1