{"title":"Introducing Data Science Techniques by Connecting Database Concepts and dplyr","authors":"Jennifer Broatch, S. Dietrich, Don Goelman","doi":"10.1080/10691898.2019.1647768","DOIUrl":null,"url":null,"abstract":"Abstract Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.","PeriodicalId":45775,"journal":{"name":"Journal of Statistics Education","volume":"27 1","pages":"147 - 153"},"PeriodicalIF":2.2000,"publicationDate":"2019-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10691898.2019.1647768","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistics Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10691898.2019.1647768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 12
Abstract
Abstract Early exposure to data science skills, such as relational databases, is essential for students in statistics as well as many other disciplines in an increasingly data driven society. The goal of the presented pedagogy is to introduce undergraduate students to fundamental database concepts and to illuminate the connection between these database concepts and the functionality provided by the dplyr package for R. Specifically, students are introduced to relational database concepts using visualizations that are specifically designed for students with no data science or computing background. These educational tools, which are freely available on the Web, engage students in the learning process through a dynamic presentation that gently introduces relational databases and how to ask questions of data stored in a relational database. The visualizations are specifically designed for self-study by students, including a formative self-assessment feature. Students are then assigned a corresponding statistics lesson to utilize statistical software in R within the dplyr framework and to emphasize the need for these database skills. This article describes a pilot experience of introducing this pedagogy into a calculus-based introductory statistics course for mathematics and statistics majors, and provides a brief evaluation of the student perspective of the experience. Supplementary materials for this article are available online.
期刊介绍:
The "Datasets and Stories" department of the Journal of Statistics Education provides a forum for exchanging interesting datasets and discussing ways they can be used effectively in teaching statistics. This section of JSE is described fully in the article "Datasets and Stories: Introduction and Guidelines" by Robin H. Lock and Tim Arnold (1993). The Journal of Statistics Education maintains a Data Archive that contains the datasets described in "Datasets and Stories" articles, as well as additional datasets useful to statistics teachers. Lock and Arnold (1993) describe several criteria that will be considered before datasets are placed in the JSE Data Archive.