利用组合滤波和主动学习技术鉴定病毒蛋白基因型决定因素

2010 IEEE International Conference on BioInformatics and BioEngineering Pub Date : 2010-05-31 DOI:10.1109/BIBE.2010.25

Chuang Wu, Andrew S. Walsh, R. Rosenfeld

{"title":"利用组合滤波和主动学习技术鉴定病毒蛋白基因型决定因素","authors":"Chuang Wu, Andrew S. Walsh, R. Rosenfeld","doi":"10.1109/BIBE.2010.25","DOIUrl":null,"url":null,"abstract":"RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning\",\"authors\":\"Chuang Wu, Andrew S. Walsh, R. Rosenfeld\",\"doi\":\"10.1109/BIBE.2010.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.\",\"PeriodicalId\":330904,\"journal\":{\"name\":\"2010 IEEE International Conference on BioInformatics and BioEngineering\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on BioInformatics and BioEngineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2010.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on BioInformatics and BioEngineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2010.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

艾滋病毒、流感等RNA病毒在全世界造成了非常严重的疾病负担。确定影响给定病毒表型的关键蛋白残基决定因素是学习基因型-表型定位和做出临床决策的重要步骤。这种鉴定目前是通过一个费力的实验过程来完成的，可以说是低效、不完整和不可靠的。我们描述了一种监督组合过滤算法，该算法系统有效地从所有可用的标记数据中推断出正确的关键残差位置集。我们证明了它的一致性，在各种数据集上验证了它，显示了优于传统识别方法的能力，并描述了它在约束增量放松下的使用。对于需要更多数据才能完全收敛到一个答案的情况，我们引入了主动学习算法，帮助从一组未标记的候选菌株或诱变实验中选择信息量最大的实验，从而最大限度地减少预期的总实验室时间或财务成本。作为一个例子，我们展示了该算法在从先前发表的猫免疫缺陷病毒包膜蛋白数据集中识别融合原性的分子决定因素方面所提供的节省。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning

RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Conference on BioInformatics and BioEngineering

自引率

0.00%

发文量