首页 > 最新文献

Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献

英文 中文
A HAUSDORFF-BASED NOE ASSIGNMENT ALGORITHM USING PROTEIN BACKBONE DETERMINED FROM RESIDUAL DIPOLAR COUPLINGS AND ROTAMER PATTERNS. 一种基于hausdorff的基于偶极偶联和旋转体模式的蛋白质骨架分配算法。
Jianyang Zeng, C. Tripathy, Pei Zhou, B. Donald
High-throughput structure determination based on solution Nuclear Magnetic Resonance (NMR) spectroscopy plays an important role in structural genomics. One of the main bottlenecks in NMR structure determination is the interpretation of NMR data to obtain a sufficient number of accurate distance restraints by assigning nuclear Overhauser effect (NOE) spectral peaks to pairs of protons. The difficulty in automated NOE assignment mainly lies in the ambiguities arising both from the resonance degeneracy of chemical shifts and from the uncertainty due to experimental errors in NOE peak positions. In this paper we present a novel NOE assignment algorithm, called HAusdorff-based NOE Assignment (HANA), that starts with a high-resolution protein backbone computed using only two residual dipolar couplings (RDCs) per residue37, 39, employs a Hausdorff-based pattern matching technique to deduce similarity between experimental and back-computed NOE spectra for each rotamer from a statistically diverse library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Our algorithm runs in time O(tn(3) +tn log t), where t is the maximum number of rotamers per residue and n is the size of the protein. Application of our algorithm on biological NMR data for three proteins, namely, human ubiquitin, the zinc finger domain of the human DNA Y-polymerase Eta (pol η) and the human Set2-Rpb1 interacting domain (hSRI) demonstrates that our algorithm overcomes spectral noise to achieve more than 90% assignment accuracy. Additionally, the final structures calculated using our automated NOE assignments have backbone RMSD < 1.7 Å and all-heavy-atom RMSD < 2.5 Å from reference structures that were determined either by X-ray crystallography or traditional NMR approaches. These results show that our NOE assignment algorithm can be successfully applied to protein NMR spectra to obtain high-quality structures.
基于溶液核磁共振(NMR)光谱的高通量结构测定在结构基因组学中发挥着重要作用。核磁共振结构测定的主要瓶颈之一是对核磁共振数据的解释,通过将核Overhauser效应(NOE)光谱峰分配给质子对来获得足够数量的精确距离约束。NOE自动赋值的困难主要在于化学位移的共振简并和NOE峰位实验误差的不确定性所产生的模糊性。在本文中,我们提出了一种新的NOE分配算法,称为基于hausdorff的NOE分配(HANA),该算法首先使用每个残差仅使用两个残差偶极耦合(rdc)计算高分辨率蛋白质骨架37,39,采用基于hausdorff的模式匹配技术,从统计多样化的库中推断每个转子的实验和反向计算的NOE光谱之间的相似性。并驱动最佳位置特定转子的选择,以过滤模糊NOE分配。我们的算法运行时间为O(tn(3) +tn log t),其中t是每个残基的最大旋转体数量,n是蛋白质的大小。将该算法应用于人类泛素、人类DNA y -聚合酶Eta (pol η)锌指结构域和人类Set2-Rpb1相互作用结构域(hSRI)三种蛋白质的生物核磁共振数据,结果表明该算法克服了光谱噪声,分配精度达到90%以上。此外,使用我们的自动化NOE分配计算的最终结构的主链RMSD < 1.7 Å,全重原子RMSD < 2.5 Å,来自x射线晶体学或传统核磁共振方法确定的参考结构。结果表明,NOE分配算法可以成功地应用于蛋白质核磁共振光谱,获得高质量的结构。
{"title":"A HAUSDORFF-BASED NOE ASSIGNMENT ALGORITHM USING PROTEIN BACKBONE DETERMINED FROM RESIDUAL DIPOLAR COUPLINGS AND ROTAMER PATTERNS.","authors":"Jianyang Zeng, C. Tripathy, Pei Zhou, B. Donald","doi":"10.1142/9781848162648_0015","DOIUrl":"https://doi.org/10.1142/9781848162648_0015","url":null,"abstract":"High-throughput structure determination based on solution Nuclear Magnetic Resonance (NMR) spectroscopy plays an important role in structural genomics. One of the main bottlenecks in NMR structure determination is the interpretation of NMR data to obtain a sufficient number of accurate distance restraints by assigning nuclear Overhauser effect (NOE) spectral peaks to pairs of protons. The difficulty in automated NOE assignment mainly lies in the ambiguities arising both from the resonance degeneracy of chemical shifts and from the uncertainty due to experimental errors in NOE peak positions. In this paper we present a novel NOE assignment algorithm, called HAusdorff-based NOE Assignment (HANA), that starts with a high-resolution protein backbone computed using only two residual dipolar couplings (RDCs) per residue37, 39, employs a Hausdorff-based pattern matching technique to deduce similarity between experimental and back-computed NOE spectra for each rotamer from a statistically diverse library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Our algorithm runs in time O(tn(3) +tn log t), where t is the maximum number of rotamers per residue and n is the size of the protein. Application of our algorithm on biological NMR data for three proteins, namely, human ubiquitin, the zinc finger domain of the human DNA Y-polymerase Eta (pol η) and the human Set2-Rpb1 interacting domain (hSRI) demonstrates that our algorithm overcomes spectral noise to achieve more than 90% assignment accuracy. Additionally, the final structures calculated using our automated NOE assignments have backbone RMSD < 1.7 Å and all-heavy-atom RMSD < 2.5 Å from reference structures that were determined either by X-ray crystallography or traditional NMR approaches. These results show that our NOE assignment algorithm can be successfully applied to protein NMR spectra to obtain high-quality structures.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"2008 1","pages":"169-181"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Knowledge representation and data mining for biological imaging. 生物成像的知识表示与数据挖掘。
W. Ahmed
Biological and pharmaceutical research relies heavily on microscopically imaging cell populations for understanding their structure and function. Much work has been done on automated analysis of biological images, but image analysis tools are generally focused only on extracting quantitative information for validating a particular hypothesis. Images contain much more information than is normally required for testing individual hypotheses. The lack of symbolic knowledge representation schemes for representing semantic image information and the absence of knowledge mining tools are the biggest obstacles in utilizing the full information content of these images. In this paper we first present a graph-based scheme for integrated representation of semantic biological knowledge contained in cellular images acquired in spatial, spectral, and temporal dimensions. We then present a spatio-temporal knowledge mining framework for extracting non-trivial and previously unknown association rules from image data sets. This mechanism can change the role of biological imaging from a tool used to validate hypotheses to one used for automatically generating new hypotheses. Results for an apoptosis screen are also presented.
生物和制药研究在很大程度上依赖于显微镜成像细胞群来了解它们的结构和功能。在生物图像的自动分析方面已经做了很多工作,但图像分析工具通常只关注于提取定量信息以验证特定假设。图像包含的信息比通常测试单个假设所需的信息多得多。缺乏用于表示语义图像信息的符号知识表示方案和缺乏知识挖掘工具是利用这些图像的全部信息内容的最大障碍。在本文中,我们首先提出了一种基于图的方案,用于在空间、光谱和时间维度上获取的细胞图像中包含的语义生物学知识的集成表示。然后,我们提出了一个时空知识挖掘框架,用于从图像数据集中提取非平凡和先前未知的关联规则。这种机制可以将生物成像的作用从验证假设的工具转变为自动生成新假设的工具。细胞凋亡筛选的结果也被提出。
{"title":"Knowledge representation and data mining for biological imaging.","authors":"W. Ahmed","doi":"10.1142/9781848162648_0027","DOIUrl":"https://doi.org/10.1142/9781848162648_0027","url":null,"abstract":"Biological and pharmaceutical research relies heavily on microscopically imaging cell populations for understanding their structure and function. Much work has been done on automated analysis of biological images, but image analysis tools are generally focused only on extracting quantitative information for validating a particular hypothesis. Images contain much more information than is normally required for testing individual hypotheses. The lack of symbolic knowledge representation schemes for representing semantic image information and the absence of knowledge mining tools are the biggest obstacles in utilizing the full information content of these images. In this paper we first present a graph-based scheme for integrated representation of semantic biological knowledge contained in cellular images acquired in spatial, spectral, and temporal dimensions. We then present a spatio-temporal knowledge mining framework for extracting non-trivial and previously unknown association rules from image data sets. This mechanism can change the role of biological imaging from a tool used to validate hypotheses to one used for automatically generating new hypotheses. Results for an apoptosis screen are also presented.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"311-4"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the accurate construction of consensus genetic maps. 论共识遗传图谱的准确构建。
Yonghui Wu, Timothy J Close, Stefano Lonardi

We study the problem of merging genetic maps, when the individual genetic maps are given as directed acyclic graphs. The problem is to build a consensus map, which includes and is consistent with all (or, the vast majority of) the markers in the individual maps. When markers in the input maps have ordering conflicts, the resulting consensus map will contain cycles. We formulate the problem of resolving cycles in a combinatorial optimization framework, which in turn is expressed as an integer linear program. A faster approximation algorithm is proposed, and an additional speed-up heuristic is developed. According to an extensive set of experimental results, our tool is consistently better than JOINMAP, both in terms of accuracy and running time.

研究了当单个遗传图被给定为有向无环图时,遗传图的合并问题。问题是建立一个共识图,它包括并与单个图中的所有(或绝大多数)标记一致。当输入映射中的标记有顺序冲突时,生成的共识映射将包含循环。我们在组合优化框架中提出了求解循环的问题,而这个问题又被表示为整数线性规划。提出了一种更快的近似算法,并开发了一种附加的加速启发式算法。根据一组广泛的实验结果,我们的工具在准确性和运行时间方面始终优于JOINMAP。
{"title":"On the accurate construction of consensus genetic maps.","authors":"Yonghui Wu,&nbsp;Timothy J Close,&nbsp;Stefano Lonardi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We study the problem of merging genetic maps, when the individual genetic maps are given as directed acyclic graphs. The problem is to build a consensus map, which includes and is consistent with all (or, the vast majority of) the markers in the individual maps. When markers in the input maps have ordering conflicts, the resulting consensus map will contain cycles. We formulate the problem of resolving cycles in a combinatorial optimization framework, which in turn is expressed as an integer linear program. A faster approximation algorithm is proposed, and an additional speed-up heuristic is developed. According to an extensive set of experimental results, our tool is consistently better than JOINMAP, both in terms of accuracy and running time.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"285-96"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Voting algorithms for the motif finding problem. 基序查找问题的投票算法。
Xiaowen Liu, Bin Ma, Lusheng Wang

Unlabelled: Finding motifs in many sequences is an important problem in computational biology, especially in identification of regulatory motifs in DNA sequences. Let c be a motif sequence. Given a set of sequences, each is planted with a mutated version of c at an unknown position, the motif finding problem is to find these planted motifs and the original c. In this paper, we study the VM model of the planted motif problem, which is proposed by Pevzner and Sze. We give a simple Selecting One Voting algorithm and a more powerful Selecting k Voting algorithm. When the length of motif and the number of input sequences are large enough, we prove that the two algorithms can find the unknown motif consensus with high probability. In the proof, we show why a large number of input sequences is so important for finding motifs, which is believed by most researchers. Experimental results on simulated data also support the claim. Selecting k Voting algorithm is powerful, but computational intensive. To speed up the algorithm, we propose a progressive filtering algorithm, which improves the running time significantly and has good accuracy in finding motifs. Our experimental results show that Selecting k Voting algorithm with progressive filtering performs very well in practice and it outperforms some best known algorithms.

Availability: The software is available upon request.

未标记:在许多序列中寻找基序是计算生物学中的一个重要问题,特别是在DNA序列中调节基序的鉴定中。设c为基序序列。给定一组序列,每个序列都在未知位置植入一个突变的c, motif寻找问题就是找到这些植入的motif和原始的c。本文研究了由Pevzner和Sze提出的植入motif问题的VM模型。我们给出了一个简单的选择1投票算法和一个更强大的选择k投票算法。当基序长度和输入序列数量足够大时,我们证明了这两种算法能够以高概率找到未知基序一致性。在证明中,我们展示了为什么大量的输入序列对于寻找基序如此重要,这是大多数研究人员所相信的。模拟数据的实验结果也支持了这一说法。投票算法功能强大,但计算量大。为了提高算法的速度,我们提出了一种递进滤波算法,该算法显著提高了运行时间,并且在寻找基序方面具有良好的准确性。实验结果表明,采用渐进式滤波的选择k投票算法在实践中表现良好,优于一些已知的算法。可用性:该软件可根据要求提供。
{"title":"Voting algorithms for the motif finding problem.","authors":"Xiaowen Liu,&nbsp;Bin Ma,&nbsp;Lusheng Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Unlabelled: </strong>Finding motifs in many sequences is an important problem in computational biology, especially in identification of regulatory motifs in DNA sequences. Let c be a motif sequence. Given a set of sequences, each is planted with a mutated version of c at an unknown position, the motif finding problem is to find these planted motifs and the original c. In this paper, we study the VM model of the planted motif problem, which is proposed by Pevzner and Sze. We give a simple Selecting One Voting algorithm and a more powerful Selecting k Voting algorithm. When the length of motif and the number of input sequences are large enough, we prove that the two algorithms can find the unknown motif consensus with high probability. In the proof, we show why a large number of input sequences is so important for finding motifs, which is believed by most researchers. Experimental results on simulated data also support the claim. Selecting k Voting algorithm is powerful, but computational intensive. To speed up the algorithm, we propose a progressive filtering algorithm, which improves the running time significantly and has good accuracy in finding motifs. Our experimental results show that Selecting k Voting algorithm with progressive filtering performs very well in practice and it outperforms some best known algorithms.</p><p><strong>Availability: </strong>The software is available upon request.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"37-47"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A probabilistic coding based quantum genetic algorithm for multiple sequence alignment. 基于概率编码的多序列比对量子遗传算法。
Hongwei Huo, Qiao-Luan Xie, Xubang Shen, V. Stojkovic
This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.
本文提出了一种结合遗传算法和量子算法的多序列比对量子遗传算法(QGMALIGN)。设计了一种表示多序列对齐的量子概率编码。利用量子旋转门作为突变算子来引导量子态演化。在编码的基础上设计了6个遗传算子,以改进进化过程中的解。利用量子力学中隐式并行性和状态叠加性的特点以及遗传算法的全局搜索能力,实现了高效的计算。参考BAliBASE2.0中一组著名的测试用例来评估QGMALIGN优化的效率。QGMALIGN结果与最流行的方法(CLUSTALX、SAGA、DIALIGN、SB_PIMA和QGMALIGN)结果进行了比较。QGMALIGN的结果表明,QGMALIGN在现有的生物学数据上表现良好。在量子算法中加入遗传算子,降低了总体运行时间成本。
{"title":"A probabilistic coding based quantum genetic algorithm for multiple sequence alignment.","authors":"Hongwei Huo, Qiao-Luan Xie, Xubang Shen, V. Stojkovic","doi":"10.1142/9781848162648_0002","DOIUrl":"https://doi.org/10.1142/9781848162648_0002","url":null,"abstract":"This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"15-26"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64000317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detecting pathways transcriptionally correlated with clinical parameters. 检测途径转录与临床参数相关。
I. Ulitsky, R. Shamir
The recent explosion in the number of clinical studies involving microarray data calls for novel computational methods for their dissection. Human protein interaction networks are rapidly growing and can assist in the extraction of functional modules from microarray data. We describe a novel methodology for extraction of connected network modules with coherent gene expression patterns that are correlated with a specific clinical parameter. Our approach suits both numerical (e.g., age or tumor size) and logical parameters (e.g., gender or mutation status). We demonstrate the method on a large breast cancer dataset, where we identify biologically-relevant modules related to nine clinical parameters including patient age, tumor size, and metastasis-free survival. Our method is capable of detecting disease-relevant pathways that could not be found using other methods. Our results support some previous hypotheses regarding the molecular pathways underlying diversity of breast tumors and suggest novel ones.
最近,涉及微阵列数据的临床研究数量激增,需要新的计算方法来解剖它们。人类蛋白质相互作用网络正在迅速发展,可以帮助从微阵列数据中提取功能模块。我们描述了一种新的方法,用于提取与特定临床参数相关的具有相干基因表达模式的连接网络模块。我们的方法既适用于数值(例如,年龄或肿瘤大小),也适用于逻辑参数(例如,性别或突变状态)。我们在一个大型乳腺癌数据集上演示了该方法,在那里我们确定了与九个临床参数相关的生物学相关模块,包括患者年龄、肿瘤大小和无转移生存期。我们的方法能够检测到其他方法无法发现的疾病相关途径。我们的研究结果支持了先前关于乳腺肿瘤多样性的分子途径的一些假设,并提出了新的假设。
{"title":"Detecting pathways transcriptionally correlated with clinical parameters.","authors":"I. Ulitsky, R. Shamir","doi":"10.1142/9781848162648_0022","DOIUrl":"https://doi.org/10.1142/9781848162648_0022","url":null,"abstract":"The recent explosion in the number of clinical studies involving microarray data calls for novel computational methods for their dissection. Human protein interaction networks are rapidly growing and can assist in the extraction of functional modules from microarray data. We describe a novel methodology for extraction of connected network modules with coherent gene expression patterns that are correlated with a specific clinical parameter. Our approach suits both numerical (e.g., age or tumor size) and logical parameters (e.g., gender or mutation status). We demonstrate the method on a large breast cancer dataset, where we identify biologically-relevant modules related to nine clinical parameters including patient age, tumor size, and metastasis-free survival. Our method is capable of detecting disease-relevant pathways that could not be found using other methods. Our results support some previous hypotheses regarding the molecular pathways underlying diversity of breast tumors and suggest novel ones.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"249-58"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
GaborLocal: peak detection in mass spectrum by Gabor filters and Gaussian local maxima. Gabor滤波器和高斯局部最大值在质谱中的峰检测。
Nha Nguyen, Heng Huang, Soontorn Oraintara, An Vo

Mass Spectrometry (MS) is increasingly being used to discover disease related proteomic patterns. The peak detection step is one of most important steps in the typical analysis of MS data. Recently, many new algorithms have been proposed to increase true position rate with low false position rate in peak detection. Most of them follow two approaches: one is denoising approach and the other one is decomposing approach. In the previous studies, the decomposition of MS data method shows more potential than the first one. In this paper, we propose a new method named GaborLocal which can detect more true peaks with a very low false position rate. The Gaussian local maxima is employed for peak detection, because it is robust to noise in signals. Moreover, the maximum rank of peaks is defined at the first time to identify peaks instead of using the signal-to-noise ratio and the Gabor filter is used to decompose the raw MS signal. We perform the proposed method on the real SELDI-TOF spectrum with known polypeptide positions. The experimental results demonstrate our method outperforms other common used methods in the receiver operating characteristic (ROC) curve.

质谱(MS)越来越多地被用于发现疾病相关的蛋白质组学模式。峰检测步骤是质谱典型分析中最重要的步骤之一。近年来,人们提出了许多新的算法来提高峰值检测的真位置率和低假位置率。它们大多采用两种方法:一种是去噪方法,另一种是分解方法。在以往的研究中,MS数据分解方法比第一种方法更有潜力。本文提出了一种名为GaborLocal的新方法,该方法可以在非常低的假位置率下检测到更多的真峰。由于高斯局部极大值对信号中的噪声具有较强的鲁棒性,因此采用高斯局部极大值进行峰值检测。此外,第一次定义峰值的最大秩来识别峰值,而不是使用信噪比,并使用Gabor滤波器对原始MS信号进行分解。我们对已知多肽位置的真实SELDI-TOF谱进行了验证。实验结果表明,该方法在受试者工作特征(ROC)曲线上优于其他常用方法。
{"title":"GaborLocal: peak detection in mass spectrum by Gabor filters and Gaussian local maxima.","authors":"Nha Nguyen,&nbsp;Heng Huang,&nbsp;Soontorn Oraintara,&nbsp;An Vo","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Mass Spectrometry (MS) is increasingly being used to discover disease related proteomic patterns. The peak detection step is one of most important steps in the typical analysis of MS data. Recently, many new algorithms have been proposed to increase true position rate with low false position rate in peak detection. Most of them follow two approaches: one is denoising approach and the other one is decomposing approach. In the previous studies, the decomposition of MS data method shows more potential than the first one. In this paper, we propose a new method named GaborLocal which can detect more true peaks with a very low false position rate. The Gaussian local maxima is employed for peak detection, because it is robust to noise in signals. Moreover, the maximum rank of peaks is defined at the first time to identify peaks instead of using the signal-to-noise ratio and the Gabor filter is used to decompose the raw MS signal. We perform the proposed method on the real SELDI-TOF spectrum with known polypeptide positions. The experimental results demonstrate our method outperforms other common used methods in the receiver operating characteristic (ROC) curve.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"85-96"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent alignment of metabolic pathways without abstraction. 一致的排列代谢途径没有抽象。
Ferhat Ay, Tamer Kahveci, Valerie de Crécy-Lagard

Pathways show how different biochemical entities interact with each other to perform vital functions for the survival of organisms. Similarities between pathways indicate functional similarities that are difficult to identify by comparing the individual entities that make up those pathways. When interacting entities are of single type, the problem of identifying similarities reduces to graph isomorphism problem. However, for pathways with varying types of entities, such as metabolic pathways, alignment problem is more challenging. Existing methods, often, address the metabolic pathway alignment problem by ignoring all the entities except for one type. This kind of abstraction reduces the relevance of the alignment significantly as it causes losses in the information content. In this paper, we develop a method to solve the pairwise alignment problem for metabolic pathways. One distinguishing feature of our method is that it aligns reactions, compounds and enzymes without abstraction of pathways. We pursue the intuition that both pairwise similarities of entities (homology) and their organization (topology) are crucial for metabolic pathway alignment. In our algorithm, we account for both by creating an eigenvalue problem for each entity type. We enforce the consistency by considering the reachability sets of the aligned entities. Our experiments show that, our method finds biologically and statistically significant alignments in the order of seconds for pathways with approximately 100 entities.

途径显示了不同的生化实体如何相互作用,以执行生物体生存的重要功能。途径之间的相似性表明,很难通过比较构成这些途径的单个实体来识别功能上的相似性。当交互实体为单一类型时,识别相似度的问题可简化为图同构问题。然而,对于具有不同类型实体的路径,如代谢路径,对齐问题更具挑战性。现有的方法通常通过忽略除一种类型外的所有实体来解决代谢途径对齐问题。这种抽象显著地降低了对齐的相关性,因为它会导致信息内容的丢失。在本文中,我们开发了一种方法来解决代谢途径的成对比对问题。我们的方法的一个显著特点是,它对齐反应,化合物和酶没有抽象的途径。我们追求的直觉是,实体的两两相似性(同源性)和它们的组织(拓扑)对代谢途径对齐至关重要。在我们的算法中,我们通过为每个实体类型创建一个特征值问题来解释这两个问题。我们通过考虑对齐实体的可达性集来增强一致性。我们的实验表明,我们的方法在大约100个实体的路径中以秒为单位发现了生物学和统计学上显著的对齐。
{"title":"Consistent alignment of metabolic pathways without abstraction.","authors":"Ferhat Ay,&nbsp;Tamer Kahveci,&nbsp;Valerie de Crécy-Lagard","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Pathways show how different biochemical entities interact with each other to perform vital functions for the survival of organisms. Similarities between pathways indicate functional similarities that are difficult to identify by comparing the individual entities that make up those pathways. When interacting entities are of single type, the problem of identifying similarities reduces to graph isomorphism problem. However, for pathways with varying types of entities, such as metabolic pathways, alignment problem is more challenging. Existing methods, often, address the metabolic pathway alignment problem by ignoring all the entities except for one type. This kind of abstraction reduces the relevance of the alignment significantly as it causes losses in the information content. In this paper, we develop a method to solve the pairwise alignment problem for metabolic pathways. One distinguishing feature of our method is that it aligns reactions, compounds and enzymes without abstraction of pathways. We pursue the intuition that both pairwise similarities of entities (homology) and their organization (topology) are crucial for metabolic pathway alignment. In our algorithm, we account for both by creating an eigenvalue problem for each entity type. We enforce the consistency by considering the reachability sets of the aligned entities. Our experiments show that, our method finds biologically and statistically significant alignments in the order of seconds for pathways with approximately 100 entities.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"237-48"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using relative importance methods to model high-throughput gene perturbation screens. 使用相对重要性方法模拟高通量基因扰动筛选。
Ying Jin, Naren Ramakrishnan, L. Heath, R. Helm
With the advent of high-throughput gene perturbation screens (e.g., RNAi assays, genome-wide deletion mutants), modeling the complex relationship between genes and phenotypes has become a paramount problem. One broad class of methods uses 'guilt by association' methods to impute phenotypes to genes based on the interactions between the given gene and other genes with known phenotypes. But these methods are inadequate for genes that have no cataloged interactions but which nevertheless are known to result in important phenotypes. In this paper, we present an approach to first model relationships between phenotypes using the notion of 'relative importance' and subsequently use these derived relationships to make phenotype predictions. Besides improved accuracy on S. cerevisiae deletion mutants and C. elegans knock-down datasets, we show how our approach sheds insight into relations between phenotypes.
随着高通量基因扰动筛选(例如,RNAi测定,全基因组缺失突变)的出现,基因和表型之间复杂关系的建模已成为一个首要问题。一大类方法使用“关联罪恶感”方法,根据给定基因与其他已知表型基因之间的相互作用,将表型归咎于基因。但是这些方法对于那些没有被编目的相互作用但却已知会导致重要表型的基因来说是不够的。在本文中,我们提出了一种方法,首先使用“相对重要性”的概念对表型之间的关系进行建模,然后使用这些衍生关系进行表型预测。除了提高酿酒葡萄球菌缺失突变体和秀丽隐杆线虫敲除数据集的准确性外,我们还展示了我们的方法如何揭示表型之间的关系。
{"title":"Using relative importance methods to model high-throughput gene perturbation screens.","authors":"Ying Jin, Naren Ramakrishnan, L. Heath, R. Helm","doi":"10.1142/9781848162648_0020","DOIUrl":"https://doi.org/10.1142/9781848162648_0020","url":null,"abstract":"With the advent of high-throughput gene perturbation screens (e.g., RNAi assays, genome-wide deletion mutants), modeling the complex relationship between genes and phenotypes has become a paramount problem. One broad class of methods uses 'guilt by association' methods to impute phenotypes to genes based on the interactions between the given gene and other genes with known phenotypes. But these methods are inadequate for genes that have no cataloged interactions but which nevertheless are known to result in important phenotypes. In this paper, we present an approach to first model relationships between phenotypes using the notion of 'relative importance' and subsequently use these derived relationships to make phenotype predictions. Besides improved accuracy on S. cerevisiae deletion mutants and C. elegans knock-down datasets, we show how our approach sheds insight into relations between phenotypes.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"225-35"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast multisegment alignments for temporal expression profiles. 快速多段比对时间表达谱。
A. Smith, M. Craven
We present two heuristics for speeding up a time series alignment algorithm that is related to dynamic time warping (DTW). In previous work, we developed our multisegment alignment algorithm to answer similarity queries for toxicogenomic time-series data. Our multisegment algorithm returns more accurate alignments than DTW at the cost of time complexity; the multisegment algorithm is O(n(5)) whereas DTW is O(n(2)). The first heuristic we present speeds up our algorithm by a constant factor by restricting alignments to a cone shape in alignment space. The second heuristic restricts the alignments considered to those near one returned by a DTW-like method. This heuristic adjusts the time complexity to O(n(3)). Importantly, neither heuristic results in a loss in accuracy.
我们提出了两种启发式算法来加速与动态时间规整(DTW)相关的时间序列对齐算法。在之前的工作中,我们开发了我们的多段比对算法来回答毒物基因组学时间序列数据的相似性查询。我们的多段算法以时间复杂度为代价,返回比DTW更精确的对齐;多段算法是O(n(5)),而DTW是O(n(2))。我们提出的第一个启发式算法通过将对齐限制为对齐空间中的锥形来提高算法的速度。第二种启发式方法将考虑的对齐限制为那些接近dtw方法返回的对齐。这种启发式算法将时间复杂度调整为O(n(3))。重要的是,两种启发式都不会导致准确性的损失。
{"title":"Fast multisegment alignments for temporal expression profiles.","authors":"A. Smith, M. Craven","doi":"10.1142/9781848162648_0028","DOIUrl":"https://doi.org/10.1142/9781848162648_0028","url":null,"abstract":"We present two heuristics for speeding up a time series alignment algorithm that is related to dynamic time warping (DTW). In previous work, we developed our multisegment alignment algorithm to answer similarity queries for toxicogenomic time-series data. Our multisegment algorithm returns more accurate alignments than DTW at the cost of time complexity; the multisegment algorithm is O(n(5)) whereas DTW is O(n(2)). The first heuristic we present speeds up our algorithm by a constant factor by restricting alignments to a cone shape in alignment space. The second heuristic restricts the alignments considered to those near one returned by a DTW-like method. This heuristic adjusts the time complexity to O(n(3)). Importantly, neither heuristic results in a loss in accuracy.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"315-26"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Computational systems bioinformatics. Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1