George Alter, Darrell Donakowski, J. Gager, P. Heus, Carson Hunter, Sanda Ionescu, J. Iverson, H. Jagadish, C. Lagoze, Jared Lyle, Alexander Mueller, Sigbjørn Revheim, M. Richardson, Ørnulf Risnes, Karunakara Seelam, Dan J. Smith, T. Smith, Jie Song, Y. Vaidya, Ole Voldsater
{"title":"统计数据的源元数据:结构化数据转换语言(SDTL)简介","authors":"George Alter, Darrell Donakowski, J. Gager, P. Heus, Carson Hunter, Sanda Ionescu, J. Iverson, H. Jagadish, C. Lagoze, Jared Lyle, Alexander Mueller, Sigbjørn Revheim, M. Richardson, Ørnulf Risnes, Karunakara Seelam, Dan J. Smith, T. Smith, Jie Song, Y. Vaidya, Ole Voldsater","doi":"10.29173/iq983","DOIUrl":null,"url":null,"abstract":"Structured Data Transformation Language (SDTL) provides structured, machine actionable representations of data transformation commands found in statistical analysis software. The Continuous Capture of Metadata for Statistical Data Project (C2Metadata) created SDTL as part of an automated system that captures provenance metadata from data transformation scripts and adds variable derivations to standard metadata files. SDTL also has potential for auditing scripts and for translating scripts between languages. SDTL is expressed in a set of JSON schemas, which are machine actionable and easily serialized to other formats. Statistical software languages have a number of special features that have been carried into SDTL. We explain how SDTL handles differences among statistical languages and complex operations, such as merging files and reshaping data tables from “wide” to “long”. ","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Provenance metadata for statistical data: An introduction to Structured Data Transformation Language (SDTL)\",\"authors\":\"George Alter, Darrell Donakowski, J. Gager, P. Heus, Carson Hunter, Sanda Ionescu, J. Iverson, H. Jagadish, C. Lagoze, Jared Lyle, Alexander Mueller, Sigbjørn Revheim, M. Richardson, Ørnulf Risnes, Karunakara Seelam, Dan J. Smith, T. Smith, Jie Song, Y. Vaidya, Ole Voldsater\",\"doi\":\"10.29173/iq983\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured Data Transformation Language (SDTL) provides structured, machine actionable representations of data transformation commands found in statistical analysis software. The Continuous Capture of Metadata for Statistical Data Project (C2Metadata) created SDTL as part of an automated system that captures provenance metadata from data transformation scripts and adds variable derivations to standard metadata files. SDTL also has potential for auditing scripts and for translating scripts between languages. SDTL is expressed in a set of JSON schemas, which are machine actionable and easily serialized to other formats. Statistical software languages have a number of special features that have been carried into SDTL. We explain how SDTL handles differences among statistical languages and complex operations, such as merging files and reshaping data tables from “wide” to “long”. \",\"PeriodicalId\":84870,\"journal\":{\"name\":\"IASSIST quarterly\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IASSIST quarterly\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29173/iq983\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IASSIST quarterly","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29173/iq983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Provenance metadata for statistical data: An introduction to Structured Data Transformation Language (SDTL)
Structured Data Transformation Language (SDTL) provides structured, machine actionable representations of data transformation commands found in statistical analysis software. The Continuous Capture of Metadata for Statistical Data Project (C2Metadata) created SDTL as part of an automated system that captures provenance metadata from data transformation scripts and adds variable derivations to standard metadata files. SDTL also has potential for auditing scripts and for translating scripts between languages. SDTL is expressed in a set of JSON schemas, which are machine actionable and easily serialized to other formats. Statistical software languages have a number of special features that have been carried into SDTL. We explain how SDTL handles differences among statistical languages and complex operations, such as merging files and reshaping data tables from “wide” to “long”.