首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
Optimized splitting of mixed-species RNA sequencing data. 混合物种 RNA 测序数据的优化分割。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-04-01 Epub Date: 2022-01-06 DOI: 10.1142/S0219720022500019
Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart

Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.

事实证明,利用异种移植或共培养系统(通常是人鼠混合细胞)进行基因表达研究,对于揭示发育过程中或疾病模型中的细胞动态非常有价值。然而,物种间 mRNA 序列的相似性给转录本的精确定量带来了挑战。为了确定分析混合物种 RNA 测序数据的最佳策略,我们评估了依赖配准和不依赖配准的方法。将读数与集合参考索引进行比对是有效的,特别是如果使用最佳比对将测序读数按物种分类,然后与单个基因组重新比对,这样就能在一系列物种比例中产生[公式:见正文]准确性。独立于配准的方法,如卷积神经网络,提取两个物种序列的保守模式,对 RNA 测序读数进行分类的准确率超过 85%。重要的是,这两种方法在人类和小鼠读数比例不同的情况下表现良好。虽然非配准策略成功地按物种划分了读数,但事实证明,先混合基因组配准再优化分离读数的传统方法更成功,错误率更低。
{"title":"Optimized splitting of mixed-species RNA sequencing data.","authors":"Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart","doi":"10.1142/S0219720022500019","DOIUrl":"10.1142/S0219720022500019","url":null,"abstract":"<p><p>Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9081140/pdf/nihms-1770823.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39792860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN. 基于LSTM网络和CNN混合结构的蛋白质琥珀酰化位点预测方法。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-04-01 Epub Date: 2022-02-21 DOI: 10.1142/S0219720022500032
Die Zhang, Shunfang Wang

The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.

蛋白质的琥珀酰化修饰参与多种细胞过程的调节。鉴定具有精确位点的修饰底物是理解琥珀酰化分子机制和调控的基础。在这项工作中,我们根据它们在琥珀酰化位点预测问题上的表现,选择了五个较好的特征码:CKSAAP、ACF、BLOSUM62、aindex和one-hot。然后,利用LSTM网络和CNN构建LSTM-CNN、CNN-LSTM、LSTM和CNN四个模型。将选择的五个特征分别输入到这四个模型中进行训练,以比较四个模型。根据各模型的性能,从中选择最优模型,构建由5个子模块组成的混合模型deepsuc,用于异构信息集成。在10倍交叉验证下,混合模型DeepSucc准确率为86.26%,特异性为84.94%,灵敏度为87.57%,AUC为0.9406,MCC为0.7254。与使用独立测试集的其他预测工具相比,DeepSucc在灵敏度和MCC方面优于它们。数据集和源代码可以在https://github.com/1835174863zd/DeepSucc上访问。
{"title":"A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN.","authors":"Die Zhang,&nbsp;Shunfang Wang","doi":"10.1142/S0219720022500032","DOIUrl":"https://doi.org/10.1142/S0219720022500032","url":null,"abstract":"<p><p>The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Tensor decomposition based on the potential low-rank and p-shrinkage generalized threshold algorithm for analyzing cancer multiomics data. 基于潜在低秩p缩广义阈值算法的张量分解癌症多组学数据分析。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-04-01 Epub Date: 2022-02-21 DOI: 10.1142/S0219720022500020
Hang-Jin Yang, Yu-Xia Lei, Juan Wang, Xiang-Zhen Kong, Jin-Xing Liu, Ying-Lian Gao

Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.

张量鲁棒主成分分析(TRPCA)在基因组学数据分析中取得了可喜的成果。然而,现有张量奇异值分解([公式:见文]-SVD)框架下的TRPCA模型未能充分提取数据潜在的低秩结构,导致恢复分量次优。同时,基于[公式:见文]-SVD定义的张量核范数(TNN)使用相同的标准处理各种奇异值。TNN忽略了奇异值的差异,导致不能很好地保存主要信息。为了保留低秩信息中的异构结构,我们提出了一种新的TNN,并将其扩展到TRPCA模型中。潜在的低秩空间可能包含重要信息。我们从核心张量中学习低秩结构信息。奇异值空间包含了基因与癌症之间的关联信息。使用[公式:见文本]-收缩广义阈值函数来保持较大奇异值的低秩性质。采用乘法器(ADMM)算法的交替方向法求解优化问题。对TCGA数据集进行了聚类和特征选择实验。实验结果表明,该模型比现有的张量分解方法更有前景。
{"title":"Tensor decomposition based on the potential low-rank and <i>p</i>-shrinkage generalized threshold algorithm for analyzing cancer multiomics data.","authors":"Hang-Jin Yang,&nbsp;Yu-Xia Lei,&nbsp;Juan Wang,&nbsp;Xiang-Zhen Kong,&nbsp;Jin-Xing Liu,&nbsp;Ying-Lian Gao","doi":"10.1142/S0219720022500020","DOIUrl":"https://doi.org/10.1142/S0219720022500020","url":null,"abstract":"<p><p>Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39942706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA modification writers influence tumor microenvironment in gastric cancer and prospects of targeted drug therapy RNA修饰对胃癌肿瘤微环境的影响及靶向药物治疗前景
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-03-14 DOI: 10.1142/S0219720022500044
P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang
Background: RNA adenosine modifications are crucial for regulating RNA levels. N6-methyladenosine (m6A), N1-methyladenosine (m1A), adenosine-to-inosine RNA editing, and alternative polyadenylation (APA) are four major RNA modification types. Methods: We evaluated the altered mRNA expression profiles of 27 RNA modification enzymes and compared the differences in tumor microenvironment (TME) and clinical prognosis between two RNA modification patterns using unsupervised clustering. Then, we constructed a scoring system, WM_score, and quantified the RNA modifications in patients of gastric cancer (GC), associating WM_score with TME, clinical outcomes, and effectiveness of targeted therapies. Results: RNA adenosine modifications strongly correlated with TME and could predict the degree of TME cell infiltration, genetic variation, and clinical prognosis. Two modification patterns were identified according to high and low WM_scores. Tumors in the WM_score-high subgroup were closely linked with survival advantage, CD4[Formula: see text] T-cell infiltration, high tumor mutation burden, and cell cycle signaling pathways, whereas those in the WM_score-low subgroup showed strong infiltration of inflammatory cells and poor survival. Regarding the immunotherapy response, a high WM_score showed a significant correlation with PD-L1 expression, predicting the effect of PD-L1 blockade therapy. Conclusion: The WM_scoring system could facilitate scoring and prediction of GC prognosis.
背景:RNA腺苷修饰对调节RNA水平至关重要。N6-甲基腺苷(m6A)、N1甲基腺苷(m1A)、腺苷到肌苷的RNA编辑和选择性多腺苷酸化(APA)是四种主要的RNA修饰类型。方法:我们评估了27种RNA修饰酶的mRNA表达谱,并使用无监督聚类法比较了两种RNA修饰模式在肿瘤微环境(TME)和临床预后方面的差异。然后,我们构建了一个评分系统WM_score,并量化了癌症(GC)患者的RNA修饰,将WM_score与TME、临床结果和靶向治疗的有效性相关联。结果:RNA腺苷修饰与TME密切相关,可预测TME细胞浸润程度、遗传变异和临床预后。根据高和低WM_scores确定了两种改性模式。WM_score-high亚组中的肿瘤与生存优势、CD4[公式:见正文]T细胞浸润、高肿瘤突变负荷和细胞周期信号通路密切相关,而WM_score-low亚组的肿瘤显示炎症细胞浸润强烈,生存率低。关于免疫治疗反应,高WM_score与PD-L1表达显著相关,预测PD-L1阻断治疗的效果。结论:WM_ scoring系统有助于GC预后的评分和预测。
{"title":"RNA modification writers influence tumor microenvironment in gastric cancer and prospects of targeted drug therapy","authors":"P. Song, Sheng Zhou, Xiaoyang Qi, Y. Jiao, Y. Gong, Jie Zhao, Haojun Yang, Z. Qian, J. Qian, Liming Tang","doi":"10.1142/S0219720022500044","DOIUrl":"https://doi.org/10.1142/S0219720022500044","url":null,"abstract":"Background: RNA adenosine modifications are crucial for regulating RNA levels. N6-methyladenosine (m6A), N1-methyladenosine (m1A), adenosine-to-inosine RNA editing, and alternative polyadenylation (APA) are four major RNA modification types. Methods: We evaluated the altered mRNA expression profiles of 27 RNA modification enzymes and compared the differences in tumor microenvironment (TME) and clinical prognosis between two RNA modification patterns using unsupervised clustering. Then, we constructed a scoring system, WM_score, and quantified the RNA modifications in patients of gastric cancer (GC), associating WM_score with TME, clinical outcomes, and effectiveness of targeted therapies. Results: RNA adenosine modifications strongly correlated with TME and could predict the degree of TME cell infiltration, genetic variation, and clinical prognosis. Two modification patterns were identified according to high and low WM_scores. Tumors in the WM_score-high subgroup were closely linked with survival advantage, CD4[Formula: see text] T-cell infiltration, high tumor mutation burden, and cell cycle signaling pathways, whereas those in the WM_score-low subgroup showed strong infiltration of inflammatory cells and poor survival. Regarding the immunotherapy response, a high WM_score showed a significant correlation with PD-L1 expression, predicting the effect of PD-L1 blockade therapy. Conclusion: The WM_scoring system could facilitate scoring and prediction of GC prognosis.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48168063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction 一种基于序列的两层预测器,用于通过增强特征提取识别增强子及其强度
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-03-09 DOI: 10.1142/S0219720022500056
Santhosh Amilpur, Raju Bhukya
Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.
增强子是短的调控DNA片段,与称为激活子的蛋白质结合。它们是自由结合的远距离元件,在控制基因表达中起着至关重要的作用。由于增强剂的动态性,确定增强剂及其强度具有挑战性。虽然存在一些机器学习方法来加速识别过程,但它们的预测精度和效率还需要更多的提高。为此,我们提出了一种基于增强特征提取策略的两层预测模型,该模型将改进的位置特异性氨基酸倾向(PSTKNC)方法与增强的核酸组成(ENAC)和k间隔核酸对组成(CKSNAP)相结合。将所有三种特征提取方法的特征集连接起来,然后通过简单的人工神经网络(ANN)准确识别第一层的增强子和第二层的增强子的强度。在基准染色质9细胞系数据集上进行了实验。采用10倍交叉验证法对模型的性能进行评价。结果表明,该模型对增强子的预测精度为94.50%,准确度为0.8903,马修相关系数(MCC)为0.8903,与现有方法相比,具有较好的独立检验性能。
{"title":"A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction","authors":"Santhosh Amilpur, Raju Bhukya","doi":"10.1142/S0219720022500056","DOIUrl":"https://doi.org/10.1142/S0219720022500056","url":null,"abstract":"Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41464283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A new Bayesian approach for QTL mapping of family data. 一种新的贝叶斯方法用于家族数据的QTL映射。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-11-19 DOI: 10.1142/S021972002150030X
Daiane Aparecida Zuanetti, Luis Aparecido Milan

In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs' effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents' genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs' position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.

本文提出了一种新的贝叶斯方法用于家族数据的QTL映射。主要目的是模拟表型作为qtl效应的函数。该模型考虑了详细的熟悉依赖性,不依赖于随机效应。它结合了亲本基因型的孟德尔遗传概率和侧翼标记与qtl的相关性。与仅使用孟德尔分离或仅使用标记和qtl之间的相关性来估计传播概率的模型相比,这是一个进步。我们使用贝叶斯方法来估计qtl的数量、位置以及加性效应和显性效应。我们使用模拟和GAW17数据集比较了该方法与方差分量和LASSO模型的性能。在测试条件下,该方法在qtl数量估计、qtl位置准确性和qtl效应估计等方面均优于其他方法。将所提出的方法应用于数据集的结果超出了我们的预期。
{"title":"A new Bayesian approach for QTL mapping of family data.","authors":"Daiane Aparecida Zuanetti,&nbsp;Luis Aparecido Milan","doi":"10.1142/S021972002150030X","DOIUrl":"https://doi.org/10.1142/S021972002150030X","url":null,"abstract":"<p><p>In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs' effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents' genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs' position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39645904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of cancer-related module in protein-protein interaction network based on gene prioritization. 基于基因优先级的蛋白质相互作用网络中癌症相关模块的鉴定。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-12-03 DOI: 10.1142/S0219720021500311
Jingli Wu, Qi Zhang, Gaoshi Li

With the rapid development of deep sequencing technologies, a large amount of high-throughput data has been available for studying the carcinogenic mechanism at the molecular level. It has been widely accepted that the development and progression of cancer are regulated by modules/pathways rather than individual genes. The investigation of identifying cancer-related active modules has received an extensive attention. In this paper, we put forward an identification method ModFinder by integrating both biological networks and gene expression profiles. More concretely, a gene scoring function is devised by using the regression model with [Formula: see text]-step random walk kernel, and the genes are ranked according to both of their active scores and degrees in the PPI network. Then a greedy algorithm NSEA is introduced to find an active module with high score and strong connectivity. Experiments were performed on both simulated data and real biological one, i.e. breast cancer and cervical cancer. Compared with the previous methods SigMod, LEAN and RegMod, ModFinder shows competitive performance. It can successfully identify a well-connected module that contains a large proportion of cancer-related genes, including some well-known oncogenes or tumor suppressors enriched in cancer-related pathways.

随着深度测序技术的快速发展,为在分子水平上研究致癌机制提供了大量高通量数据。人们普遍认为,癌症的发生和发展是由模块/途径而不是单个基因调控的。鉴别癌症相关活性模块的研究受到了广泛的关注。本文提出了一种结合生物网络和基因表达谱的识别方法ModFinder。具体而言,采用[公式:见文]步随机漫步核回归模型设计基因评分函数,并根据基因在PPI网络中的活跃分数和程度对基因进行排序。然后引入贪心算法NSEA来寻找分数高、连通性强的有源模块。实验采用模拟数据和真实的生物学数据,即乳腺癌和宫颈癌。与以往的SigMod、LEAN和RegMod方法相比,ModFinder具有较强的竞争力。它可以成功地识别出包含大量癌症相关基因的连接良好的模块,包括一些众所周知的癌基因或富集于癌症相关途径的肿瘤抑制因子。
{"title":"Identification of cancer-related module in protein-protein interaction network based on gene prioritization.","authors":"Jingli Wu,&nbsp;Qi Zhang,&nbsp;Gaoshi Li","doi":"10.1142/S0219720021500311","DOIUrl":"https://doi.org/10.1142/S0219720021500311","url":null,"abstract":"<p><p>With the rapid development of deep sequencing technologies, a large amount of high-throughput data has been available for studying the carcinogenic mechanism at the molecular level. It has been widely accepted that the development and progression of cancer are regulated by modules/pathways rather than individual genes. The investigation of identifying cancer-related active modules has received an extensive attention. In this paper, we put forward an identification method ModFinder by integrating both biological networks and gene expression profiles. More concretely, a gene scoring function is devised by using the regression model with [Formula: see text]-step random walk kernel, and the genes are ranked according to both of their active scores and degrees in the PPI network. Then a greedy algorithm NSEA is introduced to find an active module with high score and strong connectivity. Experiments were performed on both simulated data and real biological one, i.e. breast cancer and cervical cancer. Compared with the previous methods SigMod, LEAN and RegMod, ModFinder shows competitive performance. It can successfully identify a well-connected module that contains a large proportion of cancer-related genes, including some well-known oncogenes or tumor suppressors enriched in cancer-related pathways.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39956506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying duplications and lateral gene transfers simultaneously and rapidly. 识别复制和横向基因转移同时和快速。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-12-09 DOI: 10.1142/S0219720021500335
Zhi-Zhong Chen, Fei Deng, Lusheng Wang

This paper deals with the problem of enumerating all minimum-cost LCA-reconciliations involving gene duplications and lateral gene transfers (LGTs) for a given species tree [Formula: see text] and a given gene tree [Formula: see text]. Previously, [Tofigh A, Hallett M, Lagergren J, Simultaneous identification of duplications and lateral gene transfers, IEEE/ACM Trans Comput Biol Bioinf 517-535, 2011.] gave a fixed-parameter algorithm for this problem that runs in [Formula: see text] time, where [Formula: see text] is the number of vertices in [Formula: see text], [Formula: see text] is the number of vertices in [Formula: see text], and [Formula: see text] is the minimum cost of an LCA-reconciliation between [Formula: see text] and [Formula: see text]. In this paper, by refining their algorithm, we obtain a new one for the same problem that finds and outputs the solutions in a compact form within [Formula: see text] time. In the most interesting case where [Formula: see text], our algorithm is [Formula: see text] times faster.

本文讨论了给定物种树[公式:见文本]和给定基因树[公式:见文本]中涉及基因复制和横向基因转移(lgt)的所有最小成本lca调和的枚举问题。[J],王晓明,王晓明,等。基因克隆与基因转移的研究进展[J] .中国生物医学工程学报,2016,33(5):557 - 557。给出了一个固定参数的算法,该算法在[公式:见文]时间内运行,其中[公式:见文]是[公式:见文]中的顶点数,[公式:见文]是[公式:见文]中的顶点数,[公式:见文]是[公式:见文]和[公式:见文]之间lca调和的最小代价。在本文中,通过改进他们的算法,我们得到了一个新的算法,可以在[公式:见文]时间内找到并输出紧凑形式的解。在最有趣的情况下,我们的算法比[Formula: see text]快1倍。
{"title":"Identifying duplications and lateral gene transfers simultaneously and rapidly.","authors":"Zhi-Zhong Chen,&nbsp;Fei Deng,&nbsp;Lusheng Wang","doi":"10.1142/S0219720021500335","DOIUrl":"https://doi.org/10.1142/S0219720021500335","url":null,"abstract":"<p><p>This paper deals with the problem of enumerating all minimum-cost LCA-reconciliations involving gene duplications and lateral gene transfers (LGTs) for a given species tree [Formula: see text] and a given gene tree [Formula: see text]. Previously, [Tofigh A, Hallett M, Lagergren J, Simultaneous identification of duplications and lateral gene transfers, <i>IEEE/ACM Trans Comput Biol Bioinf</i> 517-535, 2011.] gave a fixed-parameter algorithm for this problem that runs in [Formula: see text] time, where [Formula: see text] is the number of vertices in [Formula: see text], [Formula: see text] is the number of vertices in [Formula: see text], and [Formula: see text] is the minimum cost of an LCA-reconciliation between [Formula: see text] and [Formula: see text]. In this paper, by refining their algorithm, we obtain a new one for the same problem that finds and outputs the solutions in a compact form within [Formula: see text] time. In the most interesting case where [Formula: see text], our algorithm is [Formula: see text] times faster.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Female reproduction-specific proteins, origins in marine species, and their evolution in the animal kingdom. 雌性生殖特异性蛋白,在海洋物种中的起源,以及它们在动物界的进化。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2022-01-12 DOI: 10.1142/S0219720022400017
Laura Rebeca Jimenez-Gutierrez

The survival of a species largely depends on the ability of individuals to reproduce, thus perpetuating their life history. The advent of metazoans (i.e. pluricellular animals) brought about the evolution of specialized tissues and organs, which in turn led to the development of complex protein regulatory pathways. This study sought to elucidate the evolutionary relationships between female reproduction-associated proteins by analyzing the transcriptomes of representative species from a selection of marine invertebrate phyla. Our study identified more than 50 reproduction-related genes across a wide evolutionary spectrum, from Porifera to Vertebrata. Among these, a total of 19 sequences had not been previously reported in at least one phylum, particularly in Porifera. Moreover, most of the structural differences between these proteins did not appear to be determined by environmental pressures or reproductive strategies, but largely obeyed a distinguishable evolutionary pattern from sponges to mammals.

一个物种的生存在很大程度上取决于个体的繁殖能力,从而延续他们的生活史。后生动物(即多细胞动物)的出现带来了专门组织和器官的进化,这反过来又导致了复杂蛋白质调控途径的发展。本研究试图通过分析海洋无脊椎动物门中代表性物种的转录组来阐明雌性生殖相关蛋白之间的进化关系。我们的研究在广泛的进化谱系中发现了50多个与生殖相关的基因,从Porifera到脊椎动物。其中,有19个序列在至少一个门中未被报道,特别是在Porifera中。此外,这些蛋白质之间的大多数结构差异似乎不是由环境压力或繁殖策略决定的,而是在很大程度上遵循了从海绵到哺乳动物的独特进化模式。
{"title":"Female reproduction-specific proteins, origins in marine species, and their evolution in the animal kingdom.","authors":"Laura Rebeca Jimenez-Gutierrez","doi":"10.1142/S0219720022400017","DOIUrl":"https://doi.org/10.1142/S0219720022400017","url":null,"abstract":"<p><p>The survival of a species largely depends on the ability of individuals to reproduce, thus perpetuating their life history. The advent of metazoans (i.e. pluricellular animals) brought about the evolution of specialized tissues and organs, which in turn led to the development of complex protein regulatory pathways. This study sought to elucidate the evolutionary relationships between female reproduction-associated proteins by analyzing the transcriptomes of representative species from a selection of marine invertebrate phyla. Our study identified more than 50 reproduction-related genes across a wide evolutionary spectrum, from Porifera to Vertebrata. Among these, a total of 19 sequences had not been previously reported in at least one phylum, particularly in Porifera. Moreover, most of the structural differences between these proteins did not appear to be determined by environmental pressures or reproductive strategies, but largely obeyed a distinguishable evolutionary pattern from sponges to mammals.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39930339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clarifying real receptor binding site between coronavirus HCoV-HKU1 and 9-O-Ac-Sia based on molecular docking. 基于分子对接的冠状病毒HCoV-HKU1与9-O-Ac-Sia真正受体结合位点的厘清
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2022-01-20 DOI: 10.1142/S0219720021500347
Xiaoyu Liu, Jingying Zhao, Sicong Li, Cai Wei, Shihang Wang, Xuanyu Xu, Yin Zheng, Xiangyu Deng, Wenliang Yuan, Xiaomin Zeng, Sihua Peng

HCoV-HKU1 is a [Formula: see text]-coronavirus with low pathogenicity, which usually leads to respiratory diseases. At present, a controversial issue is that whether the receptor binding site (RBS) of HCoV-HKU1 is located in the N-terminal domain (NTD) or the C-terminal domain (CTD) in the HCoV-HKU1 S protein. To address this issue, we used molecular docking technology to dock the NTD and CTD with 9-oxoacetylated sialic acid (9-O-Ac-Sia), respectively, with the results showing that the RBS of HCoV-HKU1 is located in the NTD (amino acid residues 80-95, 25-32). Our findings clarified the structural basis and molecular mechanism of the HCoV-HKU1 infection, providing important information for the development of therapeutic antibody drugs and the design of vaccines.

HCoV-HKU1是一种[公式:见文]低致病性的冠状病毒,通常导致呼吸道疾病。目前,HCoV-HKU1的受体结合位点(receptor binding site, RBS)位于HCoV-HKU1 S蛋白的n端结构域(NTD)还是c端结构域(CTD)是一个有争议的问题。为了解决这一问题,我们利用分子对接技术将NTD和CTD分别与9-氧乙酰化唾液酸(9-O-Ac-Sia)对接,结果表明HCoV-HKU1的RBS位于NTD(氨基酸残基80- 99,25 -32)。我们的发现阐明了HCoV-HKU1感染的结构基础和分子机制,为治疗性抗体药物的开发和疫苗的设计提供了重要信息。
{"title":"Clarifying real receptor binding site between coronavirus HCoV-HKU1 and 9-O-Ac-Sia based on molecular docking.","authors":"Xiaoyu Liu,&nbsp;Jingying Zhao,&nbsp;Sicong Li,&nbsp;Cai Wei,&nbsp;Shihang Wang,&nbsp;Xuanyu Xu,&nbsp;Yin Zheng,&nbsp;Xiangyu Deng,&nbsp;Wenliang Yuan,&nbsp;Xiaomin Zeng,&nbsp;Sihua Peng","doi":"10.1142/S0219720021500347","DOIUrl":"https://doi.org/10.1142/S0219720021500347","url":null,"abstract":"<p><p>HCoV-HKU1 is a [Formula: see text]-coronavirus with low pathogenicity, which usually leads to respiratory diseases. At present, a controversial issue is that whether the receptor binding site (RBS) of HCoV-HKU1 is located in the N-terminal domain (NTD) or the C-terminal domain (CTD) in the HCoV-HKU1 S protein. To address this issue, we used molecular docking technology to dock the NTD and CTD with 9-oxoacetylated sialic acid (9-O-Ac-Sia), respectively, with the results showing that the RBS of HCoV-HKU1 is located in the NTD (amino acid residues 80-95, 25-32). Our findings clarified the structural basis and molecular mechanism of the HCoV-HKU1 infection, providing important information for the development of therapeutic antibody drugs and the design of vaccines.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39935947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1