Marine Louarn, F. Chatonnet, Xavier Garnier, T. Fest, A. Siegel, O. Dameron
{"title":"Increasing Life Science Resources Re-Usability using Semantic Web Technologies","authors":"Marine Louarn, F. Chatonnet, Xavier Garnier, T. Fest, A. Siegel, O. Dameron","doi":"10.1109/eScience.2019.00031","DOIUrl":null,"url":null,"abstract":"In life sciences, current standardization and integration efforts are directed towards reference data and knowledge bases. However, original studies results are generally provided in non standardized and specific formats. In addition, the only formalization of analysis pipelines is often limited to textual descriptions in the method sections. Both factors impair the results reproducibility, their maintenance and their reuse for advancing other studies. Semantic Web technologies have proven their efficiency for facilitating the integration and reuse of reference data and knowledge bases. We thus hypothesize that Semantic Web technologies also facilitate reproducibility and reuse of life sciences studies involving pipelines that compute associations between entities according to intermediary relations and dependencies. In order to assess this hypothesis, we considered a case-study in systems biology (http://regulatorycircuits.org), which provides tissue-specific regulatory interaction networks to elucidate perturbations across complex diseases. Our approach consisted in surveying the complete set of provided supplementary files to reveal the underlying structure between the biological entities described in the data. We relied on this structure and used Semantic Web technologies (i) to integrate the Regulatory Circuits data, and (ii) to formalize the analysis pipeline as SPARQL queries. Our result was a 335,429,988 triples dataset on which two SPARQL queries were sufficient to extract each single tissuespecific regulatory network.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Conference on eScience (eScience)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2019.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In life sciences, current standardization and integration efforts are directed towards reference data and knowledge bases. However, original studies results are generally provided in non standardized and specific formats. In addition, the only formalization of analysis pipelines is often limited to textual descriptions in the method sections. Both factors impair the results reproducibility, their maintenance and their reuse for advancing other studies. Semantic Web technologies have proven their efficiency for facilitating the integration and reuse of reference data and knowledge bases. We thus hypothesize that Semantic Web technologies also facilitate reproducibility and reuse of life sciences studies involving pipelines that compute associations between entities according to intermediary relations and dependencies. In order to assess this hypothesis, we considered a case-study in systems biology (http://regulatorycircuits.org), which provides tissue-specific regulatory interaction networks to elucidate perturbations across complex diseases. Our approach consisted in surveying the complete set of provided supplementary files to reveal the underlying structure between the biological entities described in the data. We relied on this structure and used Semantic Web technologies (i) to integrate the Regulatory Circuits data, and (ii) to formalize the analysis pipeline as SPARQL queries. Our result was a 335,429,988 triples dataset on which two SPARQL queries were sufficient to extract each single tissuespecific regulatory network.