首页 > 最新文献

Proceedings of the ... Asia-Pacific bioinformatics conference最新文献

英文 中文
Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding. 使用选择性SNP隐藏调整基因组研究中的隐私-效用权衡。
Nour Almadhoun Alserr, Gulce Kale, Onur Mutlu, Oznur Tastan, Erman Ayday

Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenol-types and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. Nevertheless, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). In this work, we introduce a new mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40% better privacy than state-of-the-art DP-based solutions, while near-optimally minimizing utility loss.

研究人员需要丰富的基因组数据集,以便更好地了解人类基因组的遗传基础,并确定酚类型与DNA特定部分之间的关联。然而,共享包含个人敏感遗传或医学信息的基因组数据集,如果数据落入坏人之手,可能会导致与隐私相关的严重后果。限制对基因组数据集的访问是一种解决方案,但这大大降低了它们对研究目的的有用性。为了在解决这些隐私问题的同时允许共享基因组数据集,一些研究提出了数据共享的隐私保护机制。差分隐私是这样一种机制,它形式化了严格的数学基础,在共享关于数据集的聚合统计信息的同时提供隐私保证。然而,已有研究表明,当数据集中存在依赖元组时,基于dp的解决方案的原始隐私保证会降低,这是基因组数据集的常见场景(由于家庭成员的存在)。在这项工作中,我们引入了一种新的机制来减轻对包含依赖元组的基因组数据集的差异私有查询结果的推理攻击的漏洞。我们提出了一种效用最大化和隐私保护的方法,通过隐藏家庭成员参与基因组数据集时的选择性snp来共享统计数据。通过在真实世界的基因组数据集上评估我们的机制,我们通过经验证明,我们提出的机制可以比最先进的基于dp的解决方案实现高达40%的隐私保护,同时近乎最优地减少效用损失。
{"title":"Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding.","authors":"Nour Almadhoun Alserr,&nbsp;Gulce Kale,&nbsp;Onur Mutlu,&nbsp;Oznur Tastan,&nbsp;Erman Ayday","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenol-types and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. Nevertheless, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). In this work, we introduce a new mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40% better privacy than state-of-the-art DP-based solutions, while near-optimally minimizing utility loss.</p>","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10306260/pdf/nihms-1902817.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9742812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Future of Bioinformatics 生物信息学的未来
Pub Date : 2021-08-01 DOI: 10.1126/article.63850
P. Bourne
Is bioinformatics a discipline? Perhaps a more fundamental question is what is a discipline? According to the Webster's New World Dictionary it is "a system of rules, as for a monastic order." A little harsh perhaps. A more apropos definition is a field of study that stands alone, yet at the same time can be integrated into a large picture of human understanding. In this sense is bioinformatics a discipline? Certainly a question worthy of debate. Most of this audience would say yes, others would say it is a part of computer or information science, others a methodology within the biological sciences and others something destined to merge with medical informatics. If we are struggling for a unique definition perhaps this indicates we want it to be something that stands alone and distinct? I will address this question.Regardless of how we classify bioinformatics, we are dealing with a fledgling enterprise that arose out of the human genome project and has become an interpreter of the genomic language of DNA and is attempting to decipher the more complex languages where proteins are the nouns and interactions the syntax and pathways the sentences and living systems the complete volume. All fledglings learn how to discover, how to grow and adapt and how to live in a complex world. Bioinformatics is no exception. We will explore where we are on this learning curve with examples from our work and others. An exploration covering, methods development, putting the "bio" back in bioinformatics, quality control, and making the leap to systems biology.
生物信息学是一门学科吗?也许更根本的问题是什么是学科?根据《韦氏新世界词典》的解释,它是“一套规则,如修道院秩序”。也许有点刺耳。一个更恰当的定义是一个独立的研究领域,但同时又可以整合到人类理解的大图景中。从这个意义上说,生物信息学是一门学科吗?这当然是一个值得讨论的问题。大多数听众会说是的,其他人会说它是计算机或信息科学的一部分,其他人会说它是生物科学中的一种方法,还有人会说它注定要和医学信息学结合在一起。如果我们在为一个独特的定义而挣扎,也许这表明我们希望它是一个独立而独特的东西?我将回答这个问题。不管我们如何对生物信息学进行分类,我们正在处理的是一项从人类基因组计划中产生的新兴事业,它已经成为DNA基因组语言的诠释者,并试图破译更复杂的语言,其中蛋白质是名词和相互作用,语法和路径,句子和生命系统是完整的卷。所有的雏鸟都要学习如何发现,如何成长和适应,以及如何在一个复杂的世界中生活。生物信息学也不例外。我们将通过我们的工作和其他人的例子来探索我们在这个学习曲线上的位置。一个探索覆盖,方法发展,把“生物”在生物信息学,质量控制,并使跳跃到系统生物学。
{"title":"The Future of Bioinformatics","authors":"P. Bourne","doi":"10.1126/article.63850","DOIUrl":"https://doi.org/10.1126/article.63850","url":null,"abstract":"Is bioinformatics a discipline? Perhaps a more fundamental question is what is a discipline? According to the Webster's New World Dictionary it is \"a system of rules, as for a monastic order.\" A little harsh perhaps. A more apropos definition is a field of study that stands alone, yet at the same time can be integrated into a large picture of human understanding. In this sense is bioinformatics a discipline? Certainly a question worthy of debate. Most of this audience would say yes, others would say it is a part of computer or information science, others a methodology within the biological sciences and others something destined to merge with medical informatics. If we are struggling for a unique definition perhaps this indicates we want it to be something that stands alone and distinct? I will address this question.Regardless of how we classify bioinformatics, we are dealing with a fledgling enterprise that arose out of the human genome project and has become an interpreter of the genomic language of DNA and is attempting to decipher the more complex languages where proteins are the nouns and interactions the syntax and pathways the sentences and living systems the complete volume. All fledglings learn how to discover, how to grow and adapt and how to live in a complex world. Bioinformatics is no exception. We will explore where we are on this learning curve with examples from our work and others. An exploration covering, methods development, putting the \"bio\" back in bioinformatics, quality control, and making the leap to systems biology.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82758888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Predicting Nucleolar Proteins Using Support-Vector Machines 使用支持向量机预测核仁蛋白
Pub Date : 2008-01-01 DOI: 10.1142/9781848161092_0005
M. Bodén
The intra-nuclear organisation of proteins is based on possibly transient interactions with morphologically defined compartments like the nucleolus. The fluidity of trafficking challenges the development of models that accurately identify compartment membership for novel proteins. A growing inventory of nucleolar proteins is here used to train a support-vector machine to recognise sequence features that allow the automatic assignment of compartment membership. We explore a range of sequence-kernels and find that while some success is achieved with a profile-based local alignment kernel, the problem is ill-suited to a standard compartment-classification approach.
蛋白质的核内组织是基于与形态学上确定的室(如核仁)可能的短暂相互作用。贩运的流动性挑战了准确识别新蛋白质的隔室成员的模型的发展。越来越多的核仁蛋白在这里被用来训练支持向量机识别序列特征,允许自动分配室成员。我们探索了一系列序列核,发现虽然基于概要的局部比对核取得了一些成功,但这个问题不适合标准的区室分类方法。
{"title":"Predicting Nucleolar Proteins Using Support-Vector Machines","authors":"M. Bodén","doi":"10.1142/9781848161092_0005","DOIUrl":"https://doi.org/10.1142/9781848161092_0005","url":null,"abstract":"The intra-nuclear organisation of proteins is based on possibly transient interactions with morphologically defined compartments like the nucleolus. The fluidity of trafficking challenges the development of models that accurately identify compartment membership for novel proteins. A growing inventory of nucleolar proteins is here used to train a support-vector machine to recognise sequence features that allow the automatic assignment of compartment membership. We explore a range of sequence-kernels and find that while some success is achieved with a profile-based local alignment kernel, the problem is ill-suited to a standard compartment-classification approach.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80655398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating Hierarchical Controlled Vocabularies With OWL Ontology: A Case Study from the Domain of Molecular Interactions 层次控制词汇与OWL本体的整合:来自分子相互作用领域的案例研究
Pub Date : 2008-01-01 DOI: 10.1142/9781848161092_0017
Melissa J. Davis, A. Newman, I. Khan, J. Hunter, M. Ragan
Many efforts at standardising terminologies within the biological domain have resulted in the construction of hierarchical controlled vocabularies that capture domain knowledge. Vocabularies, such as the PSI-MI vocabulary, capture both deep and extensive domain knowledge, in the OBO (Open Biomedical Ontologies) format. However hierarchical vocabularies, such as PSI-MI which are represented in OBO, only represent simple parent-child relationships between terms. By contrast, ontologies constructed using the Web Ontology Language (OWL), such as BioPax, define many richer types of relationships between terms. OWL provides a semantically rich structured language for expressing classes and sub-classes of entities and properties, relationships between them and domain-specific rules or axioms that can be applied to extract new information through semantic inferencing. In order to fully exploit the domain knowledge inherent in domain-specific controlled vocabularies, they need to be represented as OWL-DL ontologies, rather than in formats such as OBO. In this paper, we describe a method for converting OBO vocabularies into OWL and class instances represented as OWL-RDF triples. This approach preserves the hierarchical arrangement of the domain knowledge whilst also making the underlying parent-child relationships available to inferencing engines. This approach also has clear advantages over existing methods which incorporate terms from external controlled vocabularies as literals stripped of the context associated with their place in the hierarchy. By preserving this context, we enable machine inferencing over the ordered domain knowledge captured in OBO controlled vocabularies
在标准化生物领域内的术语方面的许多努力已经导致构建捕获领域知识的分层控制词汇表。词汇表,如PSI-MI词汇表,以OBO(开放生物医学本体)格式捕获深度和广泛的领域知识。然而,分层词汇表,如在OBO中表示的PSI-MI,只表示术语之间的简单父子关系。相比之下,使用Web本体语言(OWL)(如BioPax)构建的本体定义了术语之间更丰富的关系类型。OWL提供了一种语义丰富的结构化语言,用于表达实体和属性的类和子类、它们之间的关系以及可用于通过语义推理提取新信息的特定于领域的规则或公理。为了充分利用特定于领域的受控词汇表中固有的领域知识,需要将它们表示为OWL-DL本体,而不是以OBO等格式表示。在本文中,我们描述了一种将OBO词汇表转换为OWL和以OWL- rdf三元组表示的类实例的方法。这种方法保留了领域知识的层次结构,同时也使推理引擎可以使用底层的父子关系。与现有的方法相比,这种方法也有明显的优势,现有的方法将来自外部受控词汇表的术语合并为字面量,剥离了与其在层次结构中的位置相关的上下文。通过保留此上下文,我们使机器能够对OBO控制词汇表中捕获的有序领域知识进行推理
{"title":"Integrating Hierarchical Controlled Vocabularies With OWL Ontology: A Case Study from the Domain of Molecular Interactions","authors":"Melissa J. Davis, A. Newman, I. Khan, J. Hunter, M. Ragan","doi":"10.1142/9781848161092_0017","DOIUrl":"https://doi.org/10.1142/9781848161092_0017","url":null,"abstract":"Many efforts at standardising terminologies within the biological domain have resulted in the construction of hierarchical controlled vocabularies that capture domain knowledge. Vocabularies, such as the PSI-MI vocabulary, capture both deep and extensive domain knowledge, in the OBO (Open Biomedical Ontologies) format. However hierarchical vocabularies, such as PSI-MI which are represented in OBO, only represent simple parent-child relationships between terms. By contrast, ontologies constructed using the Web Ontology Language (OWL), such as BioPax, define many richer types of relationships between terms. OWL provides a semantically rich structured language for expressing classes and sub-classes of entities and properties, relationships between them and domain-specific rules or axioms that can be applied to extract new information through semantic inferencing. In order to fully exploit the domain knowledge inherent in domain-specific controlled vocabularies, they need to be represented as OWL-DL ontologies, rather than in formats such as OBO. In this paper, we describe a method for converting OBO vocabularies into OWL and class instances represented as OWL-RDF triples. This approach preserves the hierarchical arrangement of the domain knowledge whilst also making the underlying parent-child relationships available to inferencing engines. This approach also has clear advantages over existing methods which incorporate terms from external controlled vocabularies as literals stripped of the context associated with their place in the hierarchy. By preserving this context, we enable machine inferencing over the ordered domain knowledge captured in OBO controlled vocabularies","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84240833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS. 利用自动挖掘的结构模式进行化合物分类。
Pub Date : 2008-01-01 DOI: 10.1901/jaba.2008.6-39
A M Smalter, J Huan, G H Lushington

In this paper we propose new methods of chemical structure classification based on the integration of graph database mining from data mining and graph kernel functions from machine learning. In our method, we first identify a set of general graph patterns in chemical structure data. These patterns are then used to augment a graph kernel function that calculates the pairwise similarity between molecules. The obtained similarity matrix is used as input to classify chemical compounds via a kernel machines such as the support vector machine (SVM). Our results indicate that the use of a pattern-based approach to graph similarity yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art approaches. In addition, the identification of highly discriminative patterns for activity classification provides evidence that our methods can make generalizations about a compound's function given its chemical structure. While we evaluated our methods on molecular structures, these methods are designed to operate on general graph data and hence could easily be applied to other domains in bioinformatics.

本文基于数据挖掘中的图数据库挖掘和机器学习中的图核函数的整合,提出了化学结构分类的新方法。在我们的方法中,我们首先从化学结构数据中识别出一组通用图模式。然后利用这些模式来增强计算分子间成对相似性的图核函数。得到的相似性矩阵作为输入,通过支持向量机(SVM)等核机器对化合物进行分类。我们的研究结果表明,使用基于模式的图形相似性方法所产生的性能曲线可与现有的最先进方法相媲美,有时甚至超过它们。此外,对活性分类的高区分度模式的识别证明,我们的方法可以根据化合物的化学结构对其功能进行归纳。虽然我们是在分子结构上对我们的方法进行评估的,但这些方法是为在一般图数据上运行而设计的,因此很容易应用于生物信息学的其他领域。
{"title":"CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS.","authors":"A M Smalter, J Huan, G H Lushington","doi":"10.1901/jaba.2008.6-39","DOIUrl":"10.1901/jaba.2008.6-39","url":null,"abstract":"<p><p>In this paper we propose new methods of chemical structure classification based on the integration of graph database mining from data mining and graph kernel functions from machine learning. In our method, we first identify a set of general graph patterns in chemical structure data. These patterns are then used to augment a graph kernel function that calculates the pairwise similarity between molecules. The obtained similarity matrix is used as input to classify chemical compounds via a kernel machines such as the support vector machine (SVM). Our results indicate that the use of a pattern-based approach to graph similarity yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art approaches. In addition, the identification of highly discriminative patterns for activity classification provides evidence that our methods can make generalizations about a compound's function given its chemical structure. While we evaluated our methods on molecular structures, these methods are designed to operate on general graph data and hence could easily be applied to other domains in bioinformatics.</p>","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2864492/pdf/nihms118197.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28970890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of the 6th Asia-Pacific Bioinformatics Conference, APBC 2008, 14-17 January 2008, Kyoto, Japan 第六届亚太生物信息学会议论文集,APBC 2008, 2008年1月14-17日,日本京都
Pub Date : 2008-01-01 DOI: 10.1142/p544
A. Brazma, S. Miyano, T. Akutsu
{"title":"Proceedings of the 6th Asia-Pacific Bioinformatics Conference, APBC 2008, 14-17 January 2008, Kyoto, Japan","authors":"A. Brazma, S. Miyano, T. Akutsu","doi":"10.1142/p544","DOIUrl":"https://doi.org/10.1142/p544","url":null,"abstract":"","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82357635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrimination of Native Folds Using Network Properties of Protein Structures 利用蛋白质结构的网络特性识别原生褶皱
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0009
Alper Küçükural, O. U. Sezerman, A. Erçil
Graph theoretic properties of proteins can be used to perceive the differences between correctly folded proteins and well designed decoy sets. 3D protein structures of proteins are represented with graphs. We used two different graph representations: Delaunay tessellations of proteins and contact map graphs. Graph theoretic properties for both graph types showed high classification accuracy for protein discrimination. Fisher, linear, quadratic, neural network, and support vector classifiers were used for the classification of the protein structures. The best classifier accuracy was over 98%. Results showed that characteristic features of graph theoretic properties can be used in the detection of native folds.
蛋白质的图论性质可以用来感知正确折叠的蛋白质和设计良好的诱饵集之间的差异。蛋白质的三维蛋白质结构用图形表示。我们使用了两种不同的图形表示:蛋白质的Delaunay镶嵌和接触图。两种图的图论性质均显示出较高的蛋白质分类准确率。采用Fisher分类器、线性分类器、二次分类器、神经网络分类器和支持向量分类器对蛋白质结构进行分类。最佳分类准确率达98%以上。结果表明,图论性质的特征特征可以用于原生褶皱的检测。
{"title":"Discrimination of Native Folds Using Network Properties of Protein Structures","authors":"Alper Küçükural, O. U. Sezerman, A. Erçil","doi":"10.1142/9781848161092_0009","DOIUrl":"https://doi.org/10.1142/9781848161092_0009","url":null,"abstract":"Graph theoretic properties of proteins can be used to perceive the differences between correctly folded proteins and well designed decoy sets. 3D protein structures of proteins are represented with graphs. We used two different graph representations: Delaunay tessellations of proteins and contact map graphs. Graph theoretic properties for both graph types showed high classification accuracy for protein discrimination. Fisher, linear, quadratic, neural network, and support vector classifiers were used for the classification of the protein structures. The best classifier accuracy was over 98%. Results showed that characteristic features of graph theoretic properties can be used in the detection of native folds.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73729120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SPR-based Tree Reconciliation: Non-binary Trees and Multiple Solutions 基于spr的树调和:非二叉树和多解
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0027
Cuong V. Than, L. Nakhleh
The SPR (subtree prune and regraft) operation is used as the basis for reconciling incongruent phylogenetic trees, particularly for detecting and analyzing non-treelike evolutionary histories such as horizontal gene transfer, hybrid speciation, and recombination. The SPR-based tree reconciliation problem has been shown to be NP-hard, and several efficient heuristics have been designed to solve it. A major drawback of these heuristics is that for the most part they do not handle non-binary trees appropriately. Further, their computational efficiency suffers significantly when computing multiple optimal reconciliations. In this paper, we present algorithmic techniques for efficient SPR-based reconciliation of trees that are not necessarily binary. Further, we present divide-and-conquer approaches that enable efficient computing of multiple optimal reconciliations. We have implemented our techniques in the PhyloNet software package, which is publicly available at http://bioinfo.cs.rice.edu. The resulting method outperforms all existing methods in terms of speed, and performs at least as well as those methods in terms of accuracy.
SPR(子树修剪和再嫁接)操作被用作协调不一致的系统发育树的基础,特别是用于检测和分析非树状进化历史,如水平基因转移、杂交物种形成和重组。基于spr的树调和问题已被证明是np困难的,并且设计了几种有效的启发式方法来解决它。这些启发式的一个主要缺点是,在大多数情况下,它们不能适当地处理非二叉树。此外,当计算多个最优调和时,它们的计算效率会受到显著影响。在本文中,我们提出了一种算法技术,可以有效地对不一定是二值的树进行基于spr的和解。此外,我们提出了分而治之的方法,使多个最优调和的有效计算。我们已经在PhyloNet软件包中实现了我们的技术,该软件包可在http://bioinfo.cs.rice.edu上公开获得。由此产生的方法在速度方面优于所有现有方法,并且在准确性方面至少与这些方法一样好。
{"title":"SPR-based Tree Reconciliation: Non-binary Trees and Multiple Solutions","authors":"Cuong V. Than, L. Nakhleh","doi":"10.1142/9781848161092_0027","DOIUrl":"https://doi.org/10.1142/9781848161092_0027","url":null,"abstract":"The SPR (subtree prune and regraft) operation is used as the basis for reconciling incongruent phylogenetic trees, particularly for detecting and analyzing non-treelike evolutionary histories such as horizontal gene transfer, hybrid speciation, and recombination. The SPR-based tree reconciliation problem has been shown to be NP-hard, and several efficient heuristics have been designed to solve it. A major drawback of these heuristics is that for the most part they do not handle non-binary trees appropriately. Further, their computational efficiency suffers significantly when computing multiple optimal reconciliations. In this paper, we present algorithmic techniques for efficient SPR-based reconciliation of trees that are not necessarily binary. Further, we present divide-and-conquer approaches that enable efficient computing of multiple optimal reconciliations. We have implemented our techniques in the PhyloNet software package, which is publicly available at http://bioinfo.cs.rice.edu. The resulting method outperforms all existing methods in terms of speed, and performs at least as well as those methods in terms of accuracy.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83581096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
fRMSDAlign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity fRMSDAlign:利用预测的局部结构信息对低序列同一性的蛋白质序列进行比对
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0014
H. Rangwala, G. Karypis
As the sequence identity between a pair of proteins decreases, alignment strategies that are based on sequence and/or sequence profiles become progressively less effective in identifying the correct structural correspondence between residue pairs. This significantly reduces the ability of comparative modelingbased approaches to build accurate structural models. Incorporating into the alignment process predicted information about the local structure of the protein holds the promise of significantly improving the alignment quality of distant proteins. This paper studies the impact on the alignment quality of a new class of predicted local structural features that measure how well fixed-length backbone fragments centered around each residue-pair align with each other. It presents a comprehensive experimental evaluation comparing these new features against existing state-of-the-art approaches utilizing profile-based and predicted secondary-structure information. It shows that for protein pairs with low sequence similarity (less than 12% sequence identity) the new structural features alone or in conjunction with profile-based information lead to alignments that are considerably better than those obtained by previous schemes.
随着一对蛋白质之间的序列一致性降低,基于序列和/或序列谱的比对策略在识别残基对之间正确的结构对应方面逐渐变得不那么有效。这大大降低了基于比较建模的方法构建精确结构模型的能力。将有关蛋白质局部结构的预测信息纳入比对过程有望显著提高远端蛋白质的比对质量。本文研究了一类新的预测局部结构特征对对齐质量的影响,这些特征测量以每个残基对为中心的固定长度骨干片段彼此对齐的程度。它提出了一个综合的实验评估,将这些新特征与现有的最先进的方法进行比较,利用基于剖面和预测的二级结构信息。结果表明,对于序列相似性较低的蛋白质对(序列一致性小于12%),单独使用新结构特征或结合基于谱的信息,比对结果明显优于以前的方案。
{"title":"fRMSDAlign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity","authors":"H. Rangwala, G. Karypis","doi":"10.1142/9781848161092_0014","DOIUrl":"https://doi.org/10.1142/9781848161092_0014","url":null,"abstract":"As the sequence identity between a pair of proteins decreases, alignment strategies that are based on sequence and/or sequence profiles become progressively less effective in identifying the correct structural correspondence between residue pairs. This significantly reduces the ability of comparative modelingbased approaches to build accurate structural models. Incorporating into the alignment process predicted information about the local structure of the protein holds the promise of significantly improving the alignment quality of distant proteins. This paper studies the impact on the alignment quality of a new class of predicted local structural features that measure how well fixed-length backbone fragments centered around each residue-pair align with each other. It presents a comprehensive experimental evaluation comparing these new features against existing state-of-the-art approaches utilizing profile-based and predicted secondary-structure information. It shows that for protein pairs with low sequence similarity (less than 12% sequence identity) the new structural features alone or in conjunction with profile-based information lead to alignments that are considerably better than those obtained by previous schemes.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82345935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency 寻找核苷酸相邻依赖DNA基序的最优算法
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0035
Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu
Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
寻找基序和相应的结合位点是研究基因表达过程的一个关键和具有挑战性的问题。字符串表示和矩阵表示是两种常用的表示图案的模型。然而,这两种表述都有一个重要的弱点,即假设一个核苷酸在结合位点的发生与其他核苷酸无关。存在更复杂的表示,如HMM或正则表达式,可以捕获核苷酸依赖性。不幸的是,这些模型并不实用(参数太多,需要许多已知的结合位点)。最近,Chin和hu引入了SPSP表示,克服了这些复杂模型的局限性。然而,在SPSP表示中发现新的基序仍然是一个np难题。在本文中,基于我们对实际结合位点的观察,我们提出了一个更简单的模型,即依赖模式集(Dependency Pattern Sets, DPS)表示,它比SPSP模型更简单,但仍然可以捕获核苷酸依赖性。我们开发了一个分支定界算法(DPS- finder)来寻找最优的DPS基序。实验结果表明,DPS- finder可以在几分钟内从22个长度为500的DNA序列中发现长度为10的基序,DPS表示与SPSP表示具有相似的性能。
{"title":"Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency","authors":"Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu","doi":"10.1142/9781848161092_0035","DOIUrl":"https://doi.org/10.1142/9781848161092_0035","url":null,"abstract":"Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83220207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the ... Asia-Pacific bioinformatics conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1