Pub Date : 2021-09-02DOI: 10.1080/26939169.2021.1995545
Sabrina Luxin Wang, A. Y. Zhang, Samuel Messer, A. Wiesner, Dennis K. Pearl
Abstract This article describes a suite of student-created Shiny apps for teaching statistics and a field test of their short-term effectiveness. To date, more than 50 Shiny apps and a growing collection of associated lesson plans, designed to enrich the teaching of both introductory and upper division statistics courses, have been developed. The apps are available for free use and their open source code can be adapted as desired. We report on the experimental testing of four of these Shiny apps to examine short-term learning outcomes in an introductory statistical concepts course.
{"title":"Student-Developed Shiny Applications for Teaching Statistics","authors":"Sabrina Luxin Wang, A. Y. Zhang, Samuel Messer, A. Wiesner, Dennis K. Pearl","doi":"10.1080/26939169.2021.1995545","DOIUrl":"https://doi.org/10.1080/26939169.2021.1995545","url":null,"abstract":"Abstract This article describes a suite of student-created Shiny apps for teaching statistics and a field test of their short-term effectiveness. To date, more than 50 Shiny apps and a growing collection of associated lesson plans, designed to enrich the teaching of both introductory and upper division statistics courses, have been developed. The apps are available for free use and their open source code can be adapted as desired. We report on the experimental testing of four of these Shiny apps to examine short-term learning outcomes in an introductory statistical concepts course.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"218 - 227"},"PeriodicalIF":1.7,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45773220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-02DOI: 10.1080/26939169.2021.1997128
Jacopo Di Iorio, S. Vantini
Abstract In this article, we discuss our attempt to teach applied statistics techniques typically taught in advanced courses, such as clustering and principal component analysis, to a non-mathematical educated audience. Considering the negative attitude and inclination toward mathematical disciplines of our students we introduce them to our topics using four different games. The four games are all user-centric, score-based arcade experiences intended to be played under the supervision of an instructor. They are developed using the Shiny web-based application framework for R. In every activity students have to follow the instructions and to interact with plots to minimize a score with a statistical meaning. No other knowledge than elementary geometry and Euclidean distance is required to complete the tasks. Results from a student questionnaire give us some confidence that the experience has benefited students, not only in terms of their ability to understand and use the explained methods but also regarding their confidence and overall satisfaction with the course. This fact suggests that these or similar activities could greatly improve the diffusion of statistical thinking at different levels of education.
{"title":"How to Get Away With Statistics: Gamification of Multivariate Statistics","authors":"Jacopo Di Iorio, S. Vantini","doi":"10.1080/26939169.2021.1997128","DOIUrl":"https://doi.org/10.1080/26939169.2021.1997128","url":null,"abstract":"Abstract In this article, we discuss our attempt to teach applied statistics techniques typically taught in advanced courses, such as clustering and principal component analysis, to a non-mathematical educated audience. Considering the negative attitude and inclination toward mathematical disciplines of our students we introduce them to our topics using four different games. The four games are all user-centric, score-based arcade experiences intended to be played under the supervision of an instructor. They are developed using the Shiny web-based application framework for R. In every activity students have to follow the instructions and to interact with plots to minimize a score with a statistical meaning. No other knowledge than elementary geometry and Euclidean distance is required to complete the tasks. Results from a student questionnaire give us some confidence that the experience has benefited students, not only in terms of their ability to understand and use the explained methods but also regarding their confidence and overall satisfaction with the course. This fact suggests that these or similar activities could greatly improve the diffusion of statistical thinking at different levels of education.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"241 - 250"},"PeriodicalIF":1.7,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42389743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-02DOI: 10.1080/26939169.2021.1965509
Hollylynne S. Lee, Taylor Harrison
Abstract This study provides a glimpse into the professional learning, beliefs, and practices of high school teachers of Advanced Placement (AP) Statistics. Data are from a survey of 445 AP Statistics teachers in late 2018. Results indicate many AP Statistics teachers have taken several statistics courses and engage in professional development related to statistics sponsored by the College Board (summer institutes, exam readings, and online community). They generally do not engage with resources developed by the American Statistical Association and the statistics education community. While AP statistics teachers structure class time with student–student interaction and use student-centered activities, they generally do not use statistics-specific technology tools and rarely engage students with datasets larger than 100 cases or with multiple variables. Teachers’ beliefs about teaching statistics do not always reflect their teaching practices. Personal time to improve, time with students (especially those on a blocked semester schedule), structure of curriculum and exam schedule, and lack of access to technology often prevent teachers from making changes to their practices. Findings call for targeted efforts to reach high school statistics teachers, engage them more in the statistics education community, and encourage curriculum and instructional approaches that more closely align with recommendations and trends in college-level introductory statistics.
{"title":"Trends in Teaching Advanced Placement Statistics: Results from a National Survey","authors":"Hollylynne S. Lee, Taylor Harrison","doi":"10.1080/26939169.2021.1965509","DOIUrl":"https://doi.org/10.1080/26939169.2021.1965509","url":null,"abstract":"Abstract This study provides a glimpse into the professional learning, beliefs, and practices of high school teachers of Advanced Placement (AP) Statistics. Data are from a survey of 445 AP Statistics teachers in late 2018. Results indicate many AP Statistics teachers have taken several statistics courses and engage in professional development related to statistics sponsored by the College Board (summer institutes, exam readings, and online community). They generally do not engage with resources developed by the American Statistical Association and the statistics education community. While AP statistics teachers structure class time with student–student interaction and use student-centered activities, they generally do not use statistics-specific technology tools and rarely engage students with datasets larger than 100 cases or with multiple variables. Teachers’ beliefs about teaching statistics do not always reflect their teaching practices. Personal time to improve, time with students (especially those on a blocked semester schedule), structure of curriculum and exam schedule, and lack of access to technology often prevent teachers from making changes to their practices. Findings call for targeted efforts to reach high school statistics teachers, engage them more in the statistics education community, and encourage curriculum and instructional approaches that more closely align with recommendations and trends in college-level introductory statistics.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"317 - 327"},"PeriodicalIF":1.7,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48294609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-02DOI: 10.1080/26939169.2021.1999871
D. Gerbing
ABSTRACT R and Python are commonly used software languages for data analytics. Using these languages as the course software for the introductory course gives students practical skills for applying statistical concepts to data analysis. However, the reliance upon the command line is perceived by the typical nontechnical introductory student as sufficiently esoteric that its use detracts from the teaching of statistical concepts and data analysis. An R package was developed based on the successive feedback of hundreds of introductory statistics students over multiple years to provide a set of functions that apply basic statistical principles with command-line R. The package offers gentler error checking and many visualizations and analytics, successfully serving as the course software for teaching and homework. This software includes pedagogical functions, data analytic functions for a variety of analyses, and the foundation for access to the entire R ecosystem and, by extension, any command-line environment.
{"title":"Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond","authors":"D. Gerbing","doi":"10.1080/26939169.2021.1999871","DOIUrl":"https://doi.org/10.1080/26939169.2021.1999871","url":null,"abstract":"ABSTRACT R and Python are commonly used software languages for data analytics. Using these languages as the course software for the introductory course gives students practical skills for applying statistical concepts to data analysis. However, the reliance upon the command line is perceived by the typical nontechnical introductory student as sufficiently esoteric that its use detracts from the teaching of statistical concepts and data analysis. An R package was developed based on the successive feedback of hundreds of introductory statistics students over multiple years to provide a set of functions that apply basic statistical principles with command-line R. The package offers gentler error checking and many visualizations and analytics, successfully serving as the course software for teaching and homework. This software includes pedagogical functions, data analytic functions for a variety of analyses, and the foundation for access to the entire R ecosystem and, by extension, any command-line environment.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"251 - 266"},"PeriodicalIF":1.7,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47377963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-02DOI: 10.1080/26939169.2021.1967229
H. Hoffman, Angelo F. Elmi
Abstract Teaching students statistical programming languages while simultaneously teaching them how to debug erroneous code is challenging. The traditional programming course focuses on error-free learning in class while students’ experiences outside of class typically involve error-full learning. While error-free teaching consists of focused lectures emphasizing correct coding, error-full teaching would follow such lectures with debugging sessions. We aimed to explore these two approaches by conducting a pilot study of 18 graduate students who voluntarily attended a SAS programming seminar held weekly from September 2018 through November 2018. Each seminar had a 10-min error-free lecture, 15-min programming assignment, 5-min break, 10-min error-full lecture, and 15-min programming assignment. We examined student performance and preference. While four students successfully completed both assignments and ten students did not successfully complete either assignment, one student successfully completed only the first assignment that directly followed the error-free lecture and three students successfully completed only the second assignment that directly followed the error-full lecture. Of the 15 students who responded, twelve (80%) preferred error-full to error-free learning. We will evaluate error-full learning on a larger scale in an introductory SAS course. Supplemental files are available online for this article.
{"title":"Do Students Learn More from Erroneous Code? Exploring Student Performance and Satisfaction in an Error-Free Versus an Error-full SAS® Programming Environment","authors":"H. Hoffman, Angelo F. Elmi","doi":"10.1080/26939169.2021.1967229","DOIUrl":"https://doi.org/10.1080/26939169.2021.1967229","url":null,"abstract":"Abstract Teaching students statistical programming languages while simultaneously teaching them how to debug erroneous code is challenging. The traditional programming course focuses on error-free learning in class while students’ experiences outside of class typically involve error-full learning. While error-free teaching consists of focused lectures emphasizing correct coding, error-full teaching would follow such lectures with debugging sessions. We aimed to explore these two approaches by conducting a pilot study of 18 graduate students who voluntarily attended a SAS programming seminar held weekly from September 2018 through November 2018. Each seminar had a 10-min error-free lecture, 15-min programming assignment, 5-min break, 10-min error-full lecture, and 15-min programming assignment. We examined student performance and preference. While four students successfully completed both assignments and ten students did not successfully complete either assignment, one student successfully completed only the first assignment that directly followed the error-free lecture and three students successfully completed only the second assignment that directly followed the error-full lecture. Of the 15 students who responded, twelve (80%) preferred error-full to error-free learning. We will evaluate error-full learning on a larger scale in an introductory SAS course. Supplemental files are available online for this article.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"228 - 240"},"PeriodicalIF":1.7,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47691692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-06DOI: 10.1080/26939169.2021.1959224
J. Witmer
{"title":"Note from the Editor","authors":"J. Witmer","doi":"10.1080/26939169.2021.1959224","DOIUrl":"https://doi.org/10.1080/26939169.2021.1959224","url":null,"abstract":"","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"155 - 155"},"PeriodicalIF":1.7,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41647866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-06DOI: 10.1080/26939169.2021.1930812
Tiffany Xiao, Yifan Ma
As Big Data continues to rise in popularity, so does an increased need for protection against potential misuses of data. We are a group of undergraduate Statistical and Data Science major students from Smith College that are actively engaged in ethical discussions concerning the use of data in our society. It can be challenging to predict future trends and technologies in data science that could cause concerns. However, we believe that some essential protections and procedures should be in place to help prevent misuses of data. In particular, we are writing to you to address our concerns with the article “OkCupid Data for Introductory Statistics and Data Science Courses” by Albert Y. Kim and Adriana Escobedo-Land that was published in your journal (Kim and Escobedo-Land 2015). In light of ethical concerns surrounding the article, herein we describe the background of how the dataset was found to contain identifiable information. We communicated this to the authors, who correspondingly corrected the article. In our opinion, there is no doubt that the dataset presented in the article holds pedagogical value as well as research value. One aspect of the educational value of the dataset is the fact that the context of possible analysis could better drive students’ interests. The research value of the data lies within the self-reported nature of the dataset, which usually is the private property of corporations and could be hard to obtain for researchers in universities. Another context in which the pedagogical value of the dataset remains is where students could use this as a case study in discussions of the ethical implications of such data, even practicing anonymization skills with the data. However, we do believe that for the dataset to be used for pedagogical purposes, further anonymizations to the dataset were necessary. Some ways that datasets like this one could be better anonymized in the future include removing unimportant variables that have identification power disproportionate to their value to research. For example, in the case of the OkCupid dataset associated with the paper, the time the data was collected could be removed, since this fact is not particularly essential but can be used for identification. Other sources of concern for this dataset are the variables that reveal geographical and temporal information on individuals. Another method could
随着大数据的不断普及,对防止潜在数据滥用的保护需求也在增加。我们是史密斯学院统计学和数据科学专业的本科生,他们积极参与有关我们社会中数据使用的伦理讨论。预测数据科学中可能引起关注的未来趋势和技术可能具有挑战性。然而,我们认为,应该制定一些基本的保护措施和程序,以帮助防止数据滥用。特别是,我们写信给您,以解决我们对Albert Y.Kim和Adriana Escobedo Land在您的期刊(Kim和Escobedo Land2015)上发表的文章“OkCupid Data for Introduction Statistics and Data Science Courses”的担忧。鉴于围绕这篇文章的伦理问题,我们在这里描述了如何发现数据集包含可识别信息的背景。我们把这件事告诉了作者,他们相应地更正了这篇文章。在我们看来,毫无疑问,文章中提供的数据集具有教学价值和研究价值。数据集的教育价值的一个方面是,可能的分析背景可以更好地激发学生的兴趣。数据的研究价值在于数据集的自我报告性质,数据集通常是公司的私有财产,大学的研究人员可能很难获得。数据集的另一个教学价值仍然存在的背景是,学生可以将其作为案例研究,讨论此类数据的道德含义,甚至可以使用数据练习匿名化技能。然而,我们确实认为,为了将数据集用于教学目的,有必要对数据集进行进一步的匿名化。像这样的数据集在未来可以更好地匿名化的一些方法包括删除不重要的变量,这些变量的识别能力与其研究价值不成比例。例如,在与论文相关的OkCupid数据集的情况下,数据收集的时间可以被删除,因为这一事实不是特别重要,但可以用于识别。该数据集关注的其他来源是揭示个人地理和时间信息的变量。另一种方法可以
{"title":"A Letter to the Journal of Statistics and Data Science Education — A Call for Review of “OkCupid Data for Introductory Statistics and Data Science Courses” by Albert Y. Kim and Adriana Escobedo-Land","authors":"Tiffany Xiao, Yifan Ma","doi":"10.1080/26939169.2021.1930812","DOIUrl":"https://doi.org/10.1080/26939169.2021.1930812","url":null,"abstract":"As Big Data continues to rise in popularity, so does an increased need for protection against potential misuses of data. We are a group of undergraduate Statistical and Data Science major students from Smith College that are actively engaged in ethical discussions concerning the use of data in our society. It can be challenging to predict future trends and technologies in data science that could cause concerns. However, we believe that some essential protections and procedures should be in place to help prevent misuses of data. In particular, we are writing to you to address our concerns with the article “OkCupid Data for Introductory Statistics and Data Science Courses” by Albert Y. Kim and Adriana Escobedo-Land that was published in your journal (Kim and Escobedo-Land 2015). In light of ethical concerns surrounding the article, herein we describe the background of how the dataset was found to contain identifiable information. We communicated this to the authors, who correspondingly corrected the article. In our opinion, there is no doubt that the dataset presented in the article holds pedagogical value as well as research value. One aspect of the educational value of the dataset is the fact that the context of possible analysis could better drive students’ interests. The research value of the data lies within the self-reported nature of the dataset, which usually is the private property of corporations and could be hard to obtain for researchers in universities. Another context in which the pedagogical value of the dataset remains is where students could use this as a case study in discussions of the ethical implications of such data, even practicing anonymization skills with the data. However, we do believe that for the dataset to be used for pedagogical purposes, further anonymizations to the dataset were necessary. Some ways that datasets like this one could be better anonymized in the future include removing unimportant variables that have identification power disproportionate to their value to research. For example, in the case of the OkCupid dataset associated with the paper, the time the data was collected could be removed, since this fact is not particularly essential but can be used for identification. Other sources of concern for this dataset are the variables that reveal geographical and temporal information on individuals. Another method could","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"214 - 215"},"PeriodicalIF":1.7,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/26939169.2021.1930812","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42510109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1080/26939169.2022.2104767
M. Rethlefsen, H. Norton, Sarah L. Meyer, Katherine A. MacWilkinson, Plato L. Smith II, Haoyang Ye
Abstract Research Reproducibility: Educating for Reproducibility, Pathways to Research Integrity was an interdisciplinary, conference hosted virtually by the University of Florida in December 2020. This event brought together educators, researchers, students, policy makers, and industry representatives from across the globe to explore best practices, innovations, and new ideas for education around reproducibility and replicability. Emphasizing a broad view of rigor and reproducibility, the conference touched on many aspects of introducing learners to transparency, rigorous study design, data science, data management, replications, and more. Transdisciplinary themes emerged from the panels, keynote, and submitted papers and poster presentations. The identified themes included lifelong learning, cultivating bottom-up change, “sneaking in” learning, just-in-time learning, targeting learners by career stage, learning by doing, learning how to learn, establishing communities of practice, librarians as interdisciplinary leaders, teamwork skills, rewards and incentives, and implementing top-down change. For each of these themes, we share ideas, practices, and actions as discussed by the conference speakers and attendees.
{"title":"Interdisciplinary Approaches and Strategies from Research Reproducibility 2020: Educating for Reproducibility","authors":"M. Rethlefsen, H. Norton, Sarah L. Meyer, Katherine A. MacWilkinson, Plato L. Smith II, Haoyang Ye","doi":"10.1080/26939169.2022.2104767","DOIUrl":"https://doi.org/10.1080/26939169.2022.2104767","url":null,"abstract":"Abstract Research Reproducibility: Educating for Reproducibility, Pathways to Research Integrity was an interdisciplinary, conference hosted virtually by the University of Florida in December 2020. This event brought together educators, researchers, students, policy makers, and industry representatives from across the globe to explore best practices, innovations, and new ideas for education around reproducibility and replicability. Emphasizing a broad view of rigor and reproducibility, the conference touched on many aspects of introducing learners to transparency, rigorous study design, data science, data management, replications, and more. Transdisciplinary themes emerged from the panels, keynote, and submitted papers and poster presentations. The identified themes included lifelong learning, cultivating bottom-up change, “sneaking in” learning, just-in-time learning, targeting learners by career stage, learning by doing, learning how to learn, establishing communities of practice, librarians as interdisciplinary leaders, teamwork skills, rewards and incentives, and implementing top-down change. For each of these themes, we share ideas, practices, and actions as discussed by the conference speakers and attendees.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"30 1","pages":"219 - 227"},"PeriodicalIF":1.7,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47260615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-25DOI: 10.1080/26939169.2021.1946450
Anna D. Peterson, Laura E. Ziegler
Abstract We present an innovative activity that uses data about LEGO sets to help students self-discover multiple linear regressions. Students are guided to predict the price of a LEGO set posted on Amazon.com (Amazon price) using LEGO characteristics such as the number of pieces, the theme (i.e., product line), and the general size of the pieces. By starting with graphical displays and simple linear regression, students are able to develop additive multiple linear regression models as well as interaction models to accomplish the task. We provide examples of student responses to the activity and suggestions for teachers based on our experiences. Supplementary materials for this article are available online.
{"title":"Building a Multiple Linear Regression Model With LEGO Brick Data","authors":"Anna D. Peterson, Laura E. Ziegler","doi":"10.1080/26939169.2021.1946450","DOIUrl":"https://doi.org/10.1080/26939169.2021.1946450","url":null,"abstract":"Abstract We present an innovative activity that uses data about LEGO sets to help students self-discover multiple linear regressions. Students are guided to predict the price of a LEGO set posted on Amazon.com (Amazon price) using LEGO characteristics such as the number of pieces, the theme (i.e., product line), and the general size of the pieces. By starting with graphical displays and simple linear regression, students are able to develop additive multiple linear regression models as well as interaction models to accomplish the task. We provide examples of student responses to the activity and suggestions for teachers based on our experiences. Supplementary materials for this article are available online.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"297 - 303"},"PeriodicalIF":1.7,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/26939169.2021.1946450","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46325594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-02DOI: 10.1080/26939169.2021.1936311
G. Ellison
Abstract Temporality-driven covariate classification had limited impact on: the specification of directed acyclic graphs (DAGs) by 85 novice analysts (medical undergraduates); or the risk of bias in DAG-informed multivariable models designed to generate causal inference from observational data. Only 71 students (83.5%) managed to complete the “Temporality-driven Covariate Classification” task, and fewer still completed the “DAG Specification” task (77.6%) or both tasks in succession (68.2%). Most students who completed the first task misclassified at least one covariate (84.5%), and misclassification rates were even higher among students who specified a DAG (92.4%). Nonetheless, across the 512 and 517 covariates considered by each of these tasks, “confounders” were far less likely to be misclassified (11/252, 4.4% and 8/261, 3.1%) than “mediators” (70/123, 56.9% and 56/115, 48.7%) or “competing exposures” (93/137, 67.9% and 86/138, 62.3%), respectively. Since estimates of total causal effects are biased in multivariable models that: fail to adjust for “confounders”; or adjust for “mediators” (or “consequences of the outcome”) misclassified as “confounders” or “competing exposures,” a substantial proportion of any models informed by the present study’s DAGs would have generated biased estimates of total causal effects (50/66, 76.8%); and this would have only been slightly lower for models informed by temporality-driven covariate classification alone (47/71, 66.2%). Supplementary materials for this article are available online.
{"title":"Might Temporal Logic Improve the Specification of Directed Acyclic Graphs (DAGs)?","authors":"G. Ellison","doi":"10.1080/26939169.2021.1936311","DOIUrl":"https://doi.org/10.1080/26939169.2021.1936311","url":null,"abstract":"Abstract Temporality-driven covariate classification had limited impact on: the specification of directed acyclic graphs (DAGs) by 85 novice analysts (medical undergraduates); or the risk of bias in DAG-informed multivariable models designed to generate causal inference from observational data. Only 71 students (83.5%) managed to complete the “Temporality-driven Covariate Classification” task, and fewer still completed the “DAG Specification” task (77.6%) or both tasks in succession (68.2%). Most students who completed the first task misclassified at least one covariate (84.5%), and misclassification rates were even higher among students who specified a DAG (92.4%). Nonetheless, across the 512 and 517 covariates considered by each of these tasks, “confounders” were far less likely to be misclassified (11/252, 4.4% and 8/261, 3.1%) than “mediators” (70/123, 56.9% and 56/115, 48.7%) or “competing exposures” (93/137, 67.9% and 86/138, 62.3%), respectively. Since estimates of total causal effects are biased in multivariable models that: fail to adjust for “confounders”; or adjust for “mediators” (or “consequences of the outcome”) misclassified as “confounders” or “competing exposures,” a substantial proportion of any models informed by the present study’s DAGs would have generated biased estimates of total causal effects (50/66, 76.8%); and this would have only been slightly lower for models informed by temporality-driven covariate classification alone (47/71, 66.2%). Supplementary materials for this article are available online.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"29 1","pages":"202 - 213"},"PeriodicalIF":1.7,"publicationDate":"2021-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/26939169.2021.1936311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44493667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}