{"title":"工作流编排工作流:使用组态Web资源api的数千个查询及其容错性","authors":"Yassene Mohammed","doi":"10.1109/eScience.2018.00061","DOIUrl":null,"url":null,"abstract":"High throughput -omics like proteomics and genomics allow detailed molecular studies of organisms. Such studies are inherently on the Big Data side regarding volume and complexity. Following the FAIR principles and reaching for transparency in publication, raw data and results are often shared in public repositories. However, despite the steadily increased amount of shared omics data, it is still challenging to compare, correlate, and integrate it to answer new questions. Here we report on our experience in reusing and repurposing publically available proteomics and genomics data to design new targeted proteomics experiments. We have developed a scientific workflow to retrieve and integrate information from various repositories and domain knowledge-bases including UniPortKB [1], GPMDB [2], PRIDE [3], PeptideAtlas [4], ProteomicsDB [5], MassIVE [6], ExPASy [7], NCBI’s dbSNP [8], and PeptideTracker [9]. Following a “Map-Reduce” approach [10] the workflow select best proteotypic peptides for Multiple Reaction Monitoring (MRM) experiment. In an attempt to gain insights into the human proteome, we have designed a second workflow to orchestrate the selection workflow. 100,000s of queries were sent to online repositories to determine if peptides were seen in previous experiments. Fault tolerance ranged from dealing with no-reply to wrong annotations. Three months run of the workflow generated a comprehensive list of 165k+ suitable proteotypic peptides covering most human proteins. The main challenge has been the evolving APIs of the resources which continuously affects the components of our integrative bioinformatic solutions.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"145 1","pages":"299-300"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Workflows Orchestrating Workflows: Thousands of Queries and Their Fault Tolerance Using APIs of Omics Web Resources\",\"authors\":\"Yassene Mohammed\",\"doi\":\"10.1109/eScience.2018.00061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High throughput -omics like proteomics and genomics allow detailed molecular studies of organisms. Such studies are inherently on the Big Data side regarding volume and complexity. Following the FAIR principles and reaching for transparency in publication, raw data and results are often shared in public repositories. However, despite the steadily increased amount of shared omics data, it is still challenging to compare, correlate, and integrate it to answer new questions. Here we report on our experience in reusing and repurposing publically available proteomics and genomics data to design new targeted proteomics experiments. We have developed a scientific workflow to retrieve and integrate information from various repositories and domain knowledge-bases including UniPortKB [1], GPMDB [2], PRIDE [3], PeptideAtlas [4], ProteomicsDB [5], MassIVE [6], ExPASy [7], NCBI’s dbSNP [8], and PeptideTracker [9]. Following a “Map-Reduce” approach [10] the workflow select best proteotypic peptides for Multiple Reaction Monitoring (MRM) experiment. In an attempt to gain insights into the human proteome, we have designed a second workflow to orchestrate the selection workflow. 100,000s of queries were sent to online repositories to determine if peptides were seen in previous experiments. Fault tolerance ranged from dealing with no-reply to wrong annotations. Three months run of the workflow generated a comprehensive list of 165k+ suitable proteotypic peptides covering most human proteins. The main challenge has been the evolving APIs of the resources which continuously affects the components of our integrative bioinformatic solutions.\",\"PeriodicalId\":6476,\"journal\":{\"name\":\"2018 IEEE 14th International Conference on e-Science (e-Science)\",\"volume\":\"145 1\",\"pages\":\"299-300\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 14th International Conference on e-Science (e-Science)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/eScience.2018.00061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 14th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2018.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Workflows Orchestrating Workflows: Thousands of Queries and Their Fault Tolerance Using APIs of Omics Web Resources
High throughput -omics like proteomics and genomics allow detailed molecular studies of organisms. Such studies are inherently on the Big Data side regarding volume and complexity. Following the FAIR principles and reaching for transparency in publication, raw data and results are often shared in public repositories. However, despite the steadily increased amount of shared omics data, it is still challenging to compare, correlate, and integrate it to answer new questions. Here we report on our experience in reusing and repurposing publically available proteomics and genomics data to design new targeted proteomics experiments. We have developed a scientific workflow to retrieve and integrate information from various repositories and domain knowledge-bases including UniPortKB [1], GPMDB [2], PRIDE [3], PeptideAtlas [4], ProteomicsDB [5], MassIVE [6], ExPASy [7], NCBI’s dbSNP [8], and PeptideTracker [9]. Following a “Map-Reduce” approach [10] the workflow select best proteotypic peptides for Multiple Reaction Monitoring (MRM) experiment. In an attempt to gain insights into the human proteome, we have designed a second workflow to orchestrate the selection workflow. 100,000s of queries were sent to online repositories to determine if peptides were seen in previous experiments. Fault tolerance ranged from dealing with no-reply to wrong annotations. Three months run of the workflow generated a comprehensive list of 165k+ suitable proteotypic peptides covering most human proteins. The main challenge has been the evolving APIs of the resources which continuously affects the components of our integrative bioinformatic solutions.