Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz
{"title":"NoSQL系统中生物信息学工作流程的数据复制评估","authors":"Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz","doi":"10.1109/BIBM.2016.7822644","DOIUrl":null,"url":null,"abstract":"Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An evaluation of data replication for bioinformatics workflows on NoSQL systems\",\"authors\":\"Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz\",\"doi\":\"10.1109/BIBM.2016.7822644\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.\",\"PeriodicalId\":345384,\"journal\":{\"name\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2016.7822644\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822644","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An evaluation of data replication for bioinformatics workflows on NoSQL systems
Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.