Xiaodong Zhang, Chong Chu, Yao Zhang, Y. Wu, Jingyang Gao
{"title":"Concod:精确的基于共识的方法,从高通量测序数据中调用删除","authors":"Xiaodong Zhang, Chong Chu, Yao Zhang, Y. Wu, Jingyang Gao","doi":"10.1109/BIBM.2016.7822495","DOIUrl":null,"url":null,"abstract":"Accurate calling of structural variations such as deletions with short sequence reads from high-throughput sequencing is an important but challenging problem in the field of genome analysis. There are many existing methods for calling deletions. At present, not a single method clearly outperforms all other methods in precision and sensitivity. A popular strategy used by several authors is combining different signatures left by deletions in order to achieve more accurate deletion calling. However, most existing methods using the combining approach are heuristic and the called deletions by these tools still contain many wrongly called deletions. In this paper, we present Concod, a machine learning based framework for calling deletions with consensus, which is able to more accurately detect and distinguish true deletions from falsely called ones. First, Concod collects candidate deletions by merging the output of multiple existing deletion calling tools. Then, features of each candidate are extracted from aligned reads based on multiple detection theories. Finally, a machine learning model is trained with these features and used to classify the true and false candidates. We test our approach on different coverage of real data and compare with existing tools, including Pindel, SVseq2, BreakDancer, and DELLY. Results show that Concod improves both precision and sensitivity of deletion calling significantly.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Concod: Accurate consensus-based approach of calling deletions from high-throughput sequencing data\",\"authors\":\"Xiaodong Zhang, Chong Chu, Yao Zhang, Y. Wu, Jingyang Gao\",\"doi\":\"10.1109/BIBM.2016.7822495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate calling of structural variations such as deletions with short sequence reads from high-throughput sequencing is an important but challenging problem in the field of genome analysis. There are many existing methods for calling deletions. At present, not a single method clearly outperforms all other methods in precision and sensitivity. A popular strategy used by several authors is combining different signatures left by deletions in order to achieve more accurate deletion calling. However, most existing methods using the combining approach are heuristic and the called deletions by these tools still contain many wrongly called deletions. In this paper, we present Concod, a machine learning based framework for calling deletions with consensus, which is able to more accurately detect and distinguish true deletions from falsely called ones. First, Concod collects candidate deletions by merging the output of multiple existing deletion calling tools. Then, features of each candidate are extracted from aligned reads based on multiple detection theories. Finally, a machine learning model is trained with these features and used to classify the true and false candidates. We test our approach on different coverage of real data and compare with existing tools, including Pindel, SVseq2, BreakDancer, and DELLY. Results show that Concod improves both precision and sensitivity of deletion calling significantly.\",\"PeriodicalId\":345384,\"journal\":{\"name\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2016.7822495\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Concod: Accurate consensus-based approach of calling deletions from high-throughput sequencing data
Accurate calling of structural variations such as deletions with short sequence reads from high-throughput sequencing is an important but challenging problem in the field of genome analysis. There are many existing methods for calling deletions. At present, not a single method clearly outperforms all other methods in precision and sensitivity. A popular strategy used by several authors is combining different signatures left by deletions in order to achieve more accurate deletion calling. However, most existing methods using the combining approach are heuristic and the called deletions by these tools still contain many wrongly called deletions. In this paper, we present Concod, a machine learning based framework for calling deletions with consensus, which is able to more accurately detect and distinguish true deletions from falsely called ones. First, Concod collects candidate deletions by merging the output of multiple existing deletion calling tools. Then, features of each candidate are extracted from aligned reads based on multiple detection theories. Finally, a machine learning model is trained with these features and used to classify the true and false candidates. We test our approach on different coverage of real data and compare with existing tools, including Pindel, SVseq2, BreakDancer, and DELLY. Results show that Concod improves both precision and sensitivity of deletion calling significantly.