{"title":"Smart Data for Genomic Information Systems: the SILE Method","authors":"Ana León Palacio, O. P. López","doi":"10.7250/csimq.2018-17.01","DOIUrl":null,"url":null,"abstract":"During the last two decades, data generated by Next Generation Sequencing Technologies have revolutionized our understanding of human biology and improved the study on how changes (variations) in the DNA are involved in the risk of suffering a certain disease. A huge amount of genomic data is publicly available and frequently used by the research community in order to extract meaningful and reliable gene-disease relationships. However, the management of this exponential growth of data has become a challenge for biologists. Under such a Big Data problem perspective, they are forced to delve into a lake of complex data spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality; but when data are used to solve a concrete problem only a small part of that “data lake” is really significant; this is what we call the “smart” data perspective. By using conceptual models and the principles of data quality management, adapted to the genomic domain, we propose a systematic approach called SILE method to move from a Big Data to a Smart Data perspective. The aim of this approach is to populate an Information System with genomic data which are accessible, informative and actionable enough to extract valuable knowledge.","PeriodicalId":416219,"journal":{"name":"Complex Syst. Informatics Model. Q.","volume":"25 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex Syst. Informatics Model. Q.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7250/csimq.2018-17.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
During the last two decades, data generated by Next Generation Sequencing Technologies have revolutionized our understanding of human biology and improved the study on how changes (variations) in the DNA are involved in the risk of suffering a certain disease. A huge amount of genomic data is publicly available and frequently used by the research community in order to extract meaningful and reliable gene-disease relationships. However, the management of this exponential growth of data has become a challenge for biologists. Under such a Big Data problem perspective, they are forced to delve into a lake of complex data spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality; but when data are used to solve a concrete problem only a small part of that “data lake” is really significant; this is what we call the “smart” data perspective. By using conceptual models and the principles of data quality management, adapted to the genomic domain, we propose a systematic approach called SILE method to move from a Big Data to a Smart Data perspective. The aim of this approach is to populate an Information System with genomic data which are accessible, informative and actionable enough to extract valuable knowledge.