Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献_第8页

Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination. 基于位点定向蛋白质重组的蛋白质工程中断点位置选择优化算法。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01

Wei Zheng, Xiaoduan Ye, Alan M Friedman, Chris Bailey-Kellogg

Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.

通过位点定向重组的蛋白质工程，通过从一组同源亲本蛋白中积累多个突变，寻求开发具有新功能或改进功能的蛋白质。通过在指定的断点位置重组亲本蛋白，建立杂交蛋白库;随后的筛选/选择鉴定出具有理想功能特征的杂种。为了提高产生新杂交种的频率，本文开发了第一种明确规划位点定向重组多样性的方法，包括描述计划混合库多样性的指标和相应优化实验的有效算法。目标是在重组过程和给定父集的约束下，尽可能统一地选择断点位置来采样序列空间(我们认为这最大化了多样性)。动态规划方法在多项式时间内选择最优断点位置。我们的方法在一个生物合成酶(purE)的断点优化中的应用，证明了多样性优化的重要性和我们算法的有效性。

{"title":"Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination.","authors":"Wei Zheng, Xiaoduan Ye, Alan M Friedman, Chris Bailey-Kellogg","doi":"","DOIUrl":"","url":null,"abstract":"Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination. 基于位点定向蛋白质重组的蛋白质工程中断点位置选择优化算法。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01 DOI: 10.1142/9781860948732_0008

Wei Zheng, Xiaoduan Ye, A. Friedman, C. Bailey-Kellogg

Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.

通过位点定向重组的蛋白质工程，通过从一组同源亲本蛋白中积累多个突变，寻求开发具有新功能或改进功能的蛋白质。通过在指定的断点位置重组亲本蛋白，建立杂交蛋白库;随后的筛选/选择鉴定出具有理想功能特征的杂种。为了提高产生新杂交种的频率，本文开发了第一种明确规划位点定向重组多样性的方法，包括描述计划混合库多样性的指标和相应优化实验的有效算法。目标是在重组过程和给定父集的约束下，尽可能统一地选择断点位置来采样序列空间(我们认为这最大化了多样性)。动态规划方法在多项式时间内选择最优断点位置。我们的方法在一个生物合成酶(purE)的断点优化中的应用，证明了多样性优化的重要性和我们算法的有效性。

{"title":"Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination.","authors":"Wei Zheng, Xiaoduan Ye, A. Friedman, C. Bailey-Kellogg","doi":"10.1142/9781860948732_0008","DOIUrl":"https://doi.org/10.1142/9781860948732_0008","url":null,"abstract":"Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem. 超级计算与玩具:利用NVIDIA 8800GTX和playstation 3的力量来解决生物信息学问题。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01 DOI: 10.1142/9781860948732_0039

Justin Wilson, Manhong Dai, Elvis Jakupovic, S. Watson, F. Meng

Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

现代显卡和游戏机通常比通用cpu具有更好的性能价格比。游戏硬件的并行处理能力非常适合高通量生物医学数据分析。我们的初步结果表明，游戏硬件是一个具有成本效益的平台，用于一些计算要求很高的生物信息学问题。

引用次数: 13

Cancer molecular pattern discovery by subspace consensus kernel classification. 子空间共识核分类发现癌症分子模式。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01

Xiaoxu Han

Cancer molecular pattern efficient discovery is essential in the molecular diagnostics. The characteristics of the gene/protein expression data are challenging traditional unsupervised classification algorithms. In this work, we describe a subspace consensus kernel clustering algorithm based on the projected gradient nonnegative matrix factorization (PG-NMF). The algorithm is a consensus kernel hierarchical clustering (CKHC) method in the subspace generated by the PG-NMF. It integrates convergence-soundness parts-based learning, subspace and kernel space clustering in the microarray and proteomics data classification. We first integrated subspace methods and kernel methods by following our framework of the input space, subspace and kernel space clustering. We demonstrate more effective classification results from our algorithm by comparison with those of the classic NMF, sparse-NMF classifications and supervised classifications (KNN and SVM) for the four benchmark cancer datasets. Our algorithm can generate a family of classification algorithms in machine learning by selecting different transforms to generate subspaces and different kernel clustering algorithms to cluster data.

肿瘤分子模式的高效发现是分子诊断的重要内容。基因/蛋白表达数据的特点对传统的无监督分类算法提出了挑战。本文提出了一种基于投影梯度非负矩阵分解(PG-NMF)的子空间一致核聚类算法。该算法是在PG-NMF生成的子空间中采用一致核层次聚类(CKHC)方法。它集成了基于收敛性部件的学习，子空间和核空间聚类的微阵列和蛋白质组学数据分类。我们首先按照输入空间、子空间和核空间聚类的框架，将子空间方法和核空间方法结合起来。通过与经典NMF分类、稀疏NMF分类和监督分类(KNN和SVM)在四种基准癌症数据集上的分类结果进行比较，我们证明了该算法的分类结果更有效。我们的算法通过选择不同的变换来生成子空间，选择不同的核聚类算法来聚类数据，从而在机器学习中生成一系列分类算法。

{"title":"Cancer molecular pattern discovery by subspace consensus kernel classification.","authors":"Xiaoxu Han","doi":"","DOIUrl":"","url":null,"abstract":"Cancer molecular pattern efficient discovery is essential in the molecular diagnostics. The characteristics of the gene/protein expression data are challenging traditional unsupervised classification algorithms. In this work, we describe a subspace consensus kernel clustering algorithm based on the projected gradient nonnegative matrix factorization (PG-NMF). The algorithm is a consensus kernel hierarchical clustering (CKHC) method in the subspace generated by the PG-NMF. It integrates convergence-soundness parts-based learning, subspace and kernel space clustering in the microarray and proteomics data classification. We first integrated subspace methods and kernel methods by following our framework of the input space, subspace and kernel space clustering. We demonstrate more effective classification results from our algorithm by comparison with those of the classic NMF, sparse-NMF classifications and supervised classifications (KNN and SVM) for the four benchmark cancer datasets. Our algorithm can generate a family of classification algorithms in machine learning by selecting different transforms to generate subspaces and different kernel clustering algorithms to cluster data.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"55-65"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27060666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transcriptional profiling of definitive endoderm derived from human embryonic stem cells. 人胚胎干细胞最终内胚层的转录谱分析。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01

Huiqing Liu, Stephen Dalton, Ying Xu

Definitive endoderm (DE), the inner germ layer of the trilaminar embryo, forms gastrointestinal tract, its derivatives, thyroid, thymus, pancreas, lungs and liver. Studies on DE formation in Xenopus, zebrafish and mouse suggest a conserved molecular mechanism among vertebrates. However, relevant analysis on this activity in human has not been extensively carried out. With the maturity of the techniques for monitoring how human embryonic stem cells (hESCs) react to signals that determine their pluripotency, proliferation, survival, and differentiation status, we are now able to conduct a similar research in human. In this paper, we present an analysis of gene expression profiles obtained from two recent experiments to identify genes expressed differentially during the process of hESCs differentiation to DE. We have carried out a systematic study on these genes to understand the related transcriptional regulations and signaling pathways using computational predictions and comparative genome analyses. Our preliminary results draw a similar transcriptional profile of hESC-DE formation to that of other vertebrates.

终末内胚层(DE)是三体胚胎的内胚层，形成胃肠道及其衍生物、甲状腺、胸腺、胰腺、肺和肝脏。对爪蟾、斑马鱼和小鼠DE形成的研究表明，DE的形成在脊椎动物中具有保守的分子机制。然而，对人体中该活性的相关分析尚未广泛开展。随着监测人类胚胎干细胞(hESCs)如何对决定其多能性、增殖、存活和分化状态的信号作出反应的技术的成熟，我们现在能够在人类身上进行类似的研究。在本文中，我们分析了最近两个实验获得的基因表达谱，以确定hESCs向DE分化过程中的差异表达基因。我们利用计算预测和比较基因组分析对这些基因进行了系统研究，以了解相关的转录调控和信号通路。我们的初步结果得出了与其他脊椎动物相似的hESC-DE形成的转录谱。

引用次数: 0

Enhanced partial order curve comparison over multiple protein folding trajectories. 在多个蛋白质折叠轨迹上增强的偏序曲线比较。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01

Hong Sun, Hakan Ferhatosmanoglu, Motonori Ota, Yusu Wang

Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. Our EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study of applying our algorithm to a miniprotein Trp-cage(24) demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.

了解蛋白质如何折叠对于我们探索生命在分子水平上如何运作至关重要。目前的计算能力使研究人员能够产生大量的折叠模拟数据。因此，迫切需要能够解释和识别新的折叠特征。在本文中，我们将每个折叠轨迹建模为一个多维曲线。然后，我们开发了一种有效的多曲线比较(MCC)算法，称为增强偏序(EPO)算法，从一组不同的折叠轨迹中提取特征，包括成功和不成功的模拟运行。我们的EPO算法通过比较来自折叠轨迹的高维曲线解决了几个新的挑战。将我们的算法应用于微型蛋白色氨酸笼的详细案例研究(24)表明，我们的算法可以在相当低的水平上检测相似性，并提取具有生物学意义的折叠事件。

引用次数: 0

Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays. 利用插入/删除频率阵列改进蛋白质序列结构比对。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01

Kyle Ellrott, Jun-tao Guo, Victor Olman, Ying Xu

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletions, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as Insertion/Deletion (Indel) Frequency Arrays (IFA). By applying IFA to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity.

随着蛋白质的进化，并不是氨基酸序列的每个部分都有相同的被删除或允许插入的概率，因为不是每个氨基酸在维持蛋白质结构方面都起着同样重要的作用。然而，在折叠识别方法中最流行的模型将每个氨基酸缺失和插入视为等概率事件。我们分析了同源和类似序列的比对模式，以确定插入和缺失的模式，并使用该信息来确定目标序列中不同氨基酸的插入和缺失统计。我们将这些模式定义为插入/删除(Indel)频率阵列(IFA)。通过将IFA应用于蛋白质穿线问题，我们能够提高比对精度，特别是对序列同源性较低的蛋白质。

引用次数: 0

Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem. 超级计算与玩具:利用NVIDIA 8800GTX和playstation 3的力量来解决生物信息学问题。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01

Justin Wilson, Manhong Dai, Elvis Jakupovic, Stanley Watson, Fan Meng

Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

现代显卡和游戏机通常比通用cpu具有更好的性能价格比。游戏硬件的并行处理能力非常适合高通量生物医学数据分析。我们的初步结果表明，游戏硬件是一个具有成本效益的平台，用于一些计算要求很高的生物信息学问题。

引用次数: 0

Deconvoluting the BAC-gene relationships using a physical map. 使用物理图谱解卷积bac -基因关系。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01 DOI: 10.1142/9781860948732_0023

Yonghui Wu, Lan Liu, T. Close, S. Lonardi

MOTIVATION The deconvolution of the relationships between BAC clones and genes is a crucial step in the selective sequencing of the regions of interest in a genome. It usually requires combinatorial pooling of unique probes obtained from the genes (unigenes), and the screening of the BAC library using the pools in a hybridization experiment. Since several probes can hybridize to the same BAC, in order for the deconvolution to be achievable the pooling design has to be able to handle a large number of positives. As a consequence, smaller pools need to be designed which in turn increases the number of hybridization experiments possibly making the entire protocol unfeasible. RESULTS We propose a new algorithm that is capable of producing high accuracy deconvolution even in the presence of a weak pooling design, i.e., when pools are rather large. The algorithm compensates for the decrease of information in the hybridization data by taking advantage of a physical map of the BAC clones. We show that the right combination of combinatorial pooling and our algorithm not only dramatically reduces the number of pools required, but also successfully deconvolutes the BAC-gene relationships with almost perfect accuracy.

动机BAC克隆和基因之间关系的反褶积是基因组中感兴趣区域选择性测序的关键步骤。它通常需要从基因(unigenes)中获得独特探针的组合池，并在杂交实验中使用池筛选BAC文库。由于多个探针可以杂交到相同的BAC，为了实现反卷积，池化设计必须能够处理大量的阳性。因此，需要设计更小的池，这反过来又增加了杂交实验的数量，可能使整个方案不可行。结果我们提出了一种新的算法，即使在存在弱池设计的情况下，即当池相当大时，也能够产生高精度的反褶积。该算法通过利用BAC克隆的物理图谱来补偿杂交数据中信息的减少。我们的研究表明，组合池和我们的算法的正确组合不仅大大减少了所需池的数量，而且还以几乎完美的精度成功地反卷积了bac -基因关系。

{"title":"Deconvoluting the BAC-gene relationships using a physical map.","authors":"Yonghui Wu, Lan Liu, T. Close, S. Lonardi","doi":"10.1142/9781860948732_0023","DOIUrl":"https://doi.org/10.1142/9781860948732_0023","url":null,"abstract":"MOTIVATION The deconvolution of the relationships between BAC clones and genes is a crucial step in the selective sequencing of the regions of interest in a genome. It usually requires combinatorial pooling of unique probes obtained from the genes (unigenes), and the screening of the BAC library using the pools in a hybridization experiment. Since several probes can hybridize to the same BAC, in order for the deconvolution to be achievable the pooling design has to be able to handle a large number of positives. As a consequence, smaller pools need to be designed which in turn increases the number of hybridization experiments possibly making the entire protocol unfeasible. RESULTS We propose a new algorithm that is capable of producing high accuracy deconvolution even in the presence of a weak pooling design, i.e., when pools are rather large. The algorithm compensates for the decrease of information in the hybridization data by taking advantage of a physical map of the BAC clones. We show that the right combination of combinatorial pooling and our algorithm not only dramatically reduces the number of pools required, but also successfully deconvolutes the BAC-gene relationships with almost perfect accuracy.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"203-14"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Using indirect protein-protein interactions for protein complex predication. 利用间接蛋白质-蛋白质相互作用预测蛋白质复合物。

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

Pub Date : 2007-01-01 DOI: 10.1142/9781860948732_0014

H. Chua, K. Ning, W. Sung, H. Leong, L. Wong

Protein complexes are fundamental for understanding principles of cellular organizations. Accurate and fast protein complex prediction from the PPI networks of increasing sizes can serve as a guide for biological experiments to discover novel protein complexes. However, protein complex prediction from PPI networks is a hard problem, especially in situations where the PPI network is noisy. We know from previous work that proteins that do not interact, but share interaction partners (level-2 neighbors) often share biological functions. The strength of functional association can be estimated using a topological weight, FS-Weight. Here we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. All direct and indirect interactions are first weighted using topological weight (FS-Weight). Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied on this modified network. We also propose a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. In this paper, we show that 1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; 2) our complex finding algorithm performs very well on interaction networks modified in this way. Since no any other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.

蛋白质复合物是理解细胞组织原理的基础。从越来越大的PPI网络中准确、快速地预测蛋白质复合物，可以为生物学实验发现新的蛋白质复合物提供指导。然而，从PPI网络中预测蛋白质复合物是一个难题，特别是在PPI网络有噪声的情况下。我们从以前的工作中知道，没有相互作用，但具有相互作用伙伴(2级邻居)的蛋白质通常具有相同的生物功能。功能关联的强度可以使用拓扑权重FS-Weight来估计。在这里，我们研究了使用2级邻居之间的间接相互作用(2级相互作用)来预测蛋白质复合物。首先使用拓扑权重(FS-Weight)对所有直接和间接相互作用进行加权。低权重的交互从网络中移除，而高权重的二级交互被引入到交互网络中。现有的聚类算法可以应用于这个修改后的网络。我们还提出了一种新的算法，该算法在修改后的网络中搜索派系，并使用“部分派系合并”方法将派系合并成簇。本文表明:1)利用间接相互作用和拓扑权重来增强蛋白质-蛋白质相互作用可以提高现有各种聚类算法预测聚类的精度;2)我们的复杂搜索算法在经过这种方式修改的交互网络上表现良好。由于除了原始的PPI网络外没有使用任何其他信息，因此我们的方法对蛋白质复合物的预测非常有用，特别是对新型蛋白质复合物的预测。

{"title":"Using indirect protein-protein interactions for protein complex predication.","authors":"H. Chua, K. Ning, W. Sung, H. Leong, L. Wong","doi":"10.1142/9781860948732_0014","DOIUrl":"https://doi.org/10.1142/9781860948732_0014","url":null,"abstract":"Protein complexes are fundamental for understanding principles of cellular organizations. Accurate and fast protein complex prediction from the PPI networks of increasing sizes can serve as a guide for biological experiments to discover novel protein complexes. However, protein complex prediction from PPI networks is a hard problem, especially in situations where the PPI network is noisy. We know from previous work that proteins that do not interact, but share interaction partners (level-2 neighbors) often share biological functions. The strength of functional association can be estimated using a topological weight, FS-Weight. Here we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. All direct and indirect interactions are first weighted using topological weight (FS-Weight). Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied on this modified network. We also propose a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a \"partial clique merging\" method. In this paper, we show that 1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; 2) our complex finding algorithm performs very well on interaction networks modified in this way. Since no any other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"97-109"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1142/9781860948732_0014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35