{"title":"利用组合滤波和主动学习技术鉴定病毒蛋白基因型决定因素","authors":"Chuang Wu, Andrew S. Walsh, R. Rosenfeld","doi":"10.1109/BIBE.2010.25","DOIUrl":null,"url":null,"abstract":"RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning\",\"authors\":\"Chuang Wu, Andrew S. Walsh, R. Rosenfeld\",\"doi\":\"10.1109/BIBE.2010.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.\",\"PeriodicalId\":330904,\"journal\":{\"name\":\"2010 IEEE International Conference on BioInformatics and BioEngineering\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on BioInformatics and BioEngineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2010.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on BioInformatics and BioEngineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2010.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning
RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.