Jacob M. Schreiber, Timothy J. Durham, W. Noble, J. Bilmes
{"title":"Avocado","authors":"Jacob M. Schreiber, Timothy J. Durham, W. Noble, J. Bilmes","doi":"10.1145/3388440.3414215","DOIUrl":null,"url":null,"abstract":"In the past decade, the use of high-throughput sequencing assays has allowed researchers to experimentally acquire thousands of functional measurements for each basepair in the human genome. Despite their value, these measurements are only a small fraction of the potential experiments that could be performed while also being too numerous to easily visualize or compute on. In a recent pair of publications [1,2], we address both of these challenges with a deep neural network tensor factorization method, Avocado, that compresses these measurements into dense, information-rich representations. We demonstrate that these learned representations can be used to impute, with high accuracy, the output of tens of thousands of functional experiments that have not yet been performed. Further, we show that, on a variety of genomics tasks, machine learning models that leverage these learned representations outperform those trained directly on the functional measurements. The code is publicly available at https://github.com/jmschrei/avocado.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"75 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the past decade, the use of high-throughput sequencing assays has allowed researchers to experimentally acquire thousands of functional measurements for each basepair in the human genome. Despite their value, these measurements are only a small fraction of the potential experiments that could be performed while also being too numerous to easily visualize or compute on. In a recent pair of publications [1,2], we address both of these challenges with a deep neural network tensor factorization method, Avocado, that compresses these measurements into dense, information-rich representations. We demonstrate that these learned representations can be used to impute, with high accuracy, the output of tens of thousands of functional experiments that have not yet been performed. Further, we show that, on a variety of genomics tasks, machine learning models that leverage these learned representations outperform those trained directly on the functional measurements. The code is publicly available at https://github.com/jmschrei/avocado.