While many strategies for protecting personal privacy have relied on regulatory frameworks, consent and anonymizing data, such approaches are not always effective. Frameworks and Terms and Conditions often lag user behaviour and advances in technology and software; consent can be provisional and fragile; and the anonymization of data may impede personalized learning. This paper reports on a dialogical multi-case study methodology of four Massive Open Online Course (MOOC) providers from different geopolitical and regulatory contexts. It explores how the providers (1) define 'personal data' and whether they acknowledge a category of 'special' or 'sensitive' data; (2) address the issue and scope of student consent (and define that scope); and (3) use student data in order to inform pedagogy and/or adapt the learning experience to personalise the context or to increase student retention and success rates. This study found that large amounts of personal data continue to be collected for purposes seemingly unrelated to the delivery and support of courses. The capacity for users to withdraw or withhold consent for the collection of certain categories of data such as sensitive personal data remains severely constrained. This paper proposes that user consent at the time of registration should be reconsidered, and that there is a particular need for consent when sensitive personal data are used to personalize learning, or for purposes outside the original intention of obtaining consent.
{"title":"The unbearable lightness of consent: mapping MOOC providers' response to consent","authors":"Mohammad Khalil, P. Prinsloo, Sharon Slade","doi":"10.1145/3231644.3231659","DOIUrl":"https://doi.org/10.1145/3231644.3231659","url":null,"abstract":"While many strategies for protecting personal privacy have relied on regulatory frameworks, consent and anonymizing data, such approaches are not always effective. Frameworks and Terms and Conditions often lag user behaviour and advances in technology and software; consent can be provisional and fragile; and the anonymization of data may impede personalized learning. This paper reports on a dialogical multi-case study methodology of four Massive Open Online Course (MOOC) providers from different geopolitical and regulatory contexts. It explores how the providers (1) define 'personal data' and whether they acknowledge a category of 'special' or 'sensitive' data; (2) address the issue and scope of student consent (and define that scope); and (3) use student data in order to inform pedagogy and/or adapt the learning experience to personalise the context or to increase student retention and success rates. This study found that large amounts of personal data continue to be collected for purposes seemingly unrelated to the delivery and support of courses. The capacity for users to withdraw or withhold consent for the collection of certain categories of data such as sensitive personal data remains severely constrained. This paper proposes that user consent at the time of registration should be reconsidered, and that there is a particular need for consent when sensitive personal data are used to personalize learning, or for purposes outside the original intention of obtaining consent.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86971887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Montebello, Petrilson Pinheiro, B. Cope, M. Kalantzis, Tabassum Amina, Duane Searsmith, D. Cao
Student performance over a course of an academic program can be significantly affected and positively influenced through a series of feedback processes by peers and tutors. Ideally, this feedback is structured and incremental, and as a consequence, data presents at large scale even in relatively small classes. In this paper, we investigate the effect of such processes as we analyze assessment data collected from online courses. We plan to fully analyze the massive dataset of over three and a half million granular data points generated to make the case for the scalability of these kinds of learning analytics. This could shed crucial light on assessment mechanism in MOOCs, as we continue to refine our processes in an effort to strike a balance of emphasis on formative in addition to summative assessment.
{"title":"The impact of the peer review process evolution on learner performance in e-learning environments","authors":"M. Montebello, Petrilson Pinheiro, B. Cope, M. Kalantzis, Tabassum Amina, Duane Searsmith, D. Cao","doi":"10.1145/3231644.3231693","DOIUrl":"https://doi.org/10.1145/3231644.3231693","url":null,"abstract":"Student performance over a course of an academic program can be significantly affected and positively influenced through a series of feedback processes by peers and tutors. Ideally, this feedback is structured and incremental, and as a consequence, data presents at large scale even in relatively small classes. In this paper, we investigate the effect of such processes as we analyze assessment data collected from online courses. We plan to fully analyze the massive dataset of over three and a half million granular data points generated to make the case for the scalability of these kinds of learning analytics. This could shed crucial light on assessment mechanism in MOOCs, as we continue to refine our processes in an effort to strike a balance of emphasis on formative in addition to summative assessment.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"108 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79219799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online learning platforms, such as edX, generate usage statistics data that can be valuable to educators. However, handling this raw data can prove challenging and time consuming for instructors and course designers. The raw data for the MIT courses running on the edX platform (MITx courses) are pre-processed and stored in a Google BigQuery database. We designed a tool based on Python and additional open-source Python packages such as Jupyter Notebook, to enable instructors to analyze their student data easily and securely. We expect that instructors would be encouraged to adopt more evidence-based teaching practices based on their interaction with the data.
{"title":"Managing and analyzing student learning data: a python-based solution for edX","authors":"Vita Lampietti, Anindya Roy, Sheryl Barnes","doi":"10.1145/3231644.3231706","DOIUrl":"https://doi.org/10.1145/3231644.3231706","url":null,"abstract":"Online learning platforms, such as edX, generate usage statistics data that can be valuable to educators. However, handling this raw data can prove challenging and time consuming for instructors and course designers. The raw data for the MIT courses running on the edX platform (MITx courses) are pre-processed and stored in a Google BigQuery database. We designed a tool based on Python and additional open-source Python packages such as Jupyter Notebook, to enable instructors to analyze their student data easily and securely. We expect that instructors would be encouraged to adopt more evidence-based teaching practices based on their interaction with the data.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85040767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Within the field of on-line tutoring systems for learning programming, such as Code.org's Hour of code, there is a trend to use previous student data to give hints. This paper shows that it is better to use expert knowledge to provide hints in environments such as Code.org's Hour of code. We present a heuristic-based approach to generating next-step hints. We use pattern matching algorithms to identify heuristics and apply each identified heuristic to an input program. We generate a next-step hint by selecting the highest scoring heuristic using a scoring function. By comparing our results with results of a previous experiment on Hour of code we show that a heuristics-based approach to providing hints gives results that are impossible to further improve. These basic heuristics are sufficient to efficiently mimic experts' next-step hints.
在学习编程的在线辅导系统领域,比如Code.org的“编程一小时”(Hour of code),有一种趋势是使用以前学生的数据来提供提示。本文表明,在Code.org的代码一小时(Hour of code)等环境中,最好使用专家知识来提供提示。我们提出了一种基于启发式的方法来生成下一步提示。我们使用模式匹配算法来识别启发式,并将每个识别的启发式应用于输入程序。我们通过使用评分函数选择得分最高的启发式来生成下一步提示。通过将我们的结果与之前在Hour of code上的实验结果进行比较,我们发现基于启发式的方法提供提示的结果是不可能进一步改进的。这些基本的启发式方法足以有效地模仿专家的下一步提示。
{"title":"Use expert knowledge instead of data: generating hints for hour of code exercises","authors":"M. Buwalda, J. Jeuring, N. Naus","doi":"10.1145/3231644.3231690","DOIUrl":"https://doi.org/10.1145/3231644.3231690","url":null,"abstract":"Within the field of on-line tutoring systems for learning programming, such as Code.org's Hour of code, there is a trend to use previous student data to give hints. This paper shows that it is better to use expert knowledge to provide hints in environments such as Code.org's Hour of code. We present a heuristic-based approach to generating next-step hints. We use pattern matching algorithms to identify heuristics and apply each identified heuristic to an input program. We generate a next-step hint by selecting the highest scoring heuristic using a scoring function. By comparing our results with results of a previous experiment on Hour of code we show that a heuristics-based approach to providing hints gives results that are impossible to further improve. These basic heuristics are sufficient to efficiently mimic experts' next-step hints.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86429115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radek Pelánek, Tomáš Effenberger, Matej Vanek, Vojtech Sassmann, Dominik Gmiterko
A personalized learning system needs a large pool of items for learners to solve. When working with a large pool of items, it is useful to measure the similarity of items. We outline a general approach to measuring the similarity of items and discuss specific measures for items used in introductory programming. Evaluation of quality of similarity measures is difficult. To this end, we propose an evaluation approach utilizing three levels of abstraction. We illustrate our approach to measuring similarity and provide evaluation using items from three diverse programming environments.
{"title":"Measuring item similarity in introductory programming","authors":"Radek Pelánek, Tomáš Effenberger, Matej Vanek, Vojtech Sassmann, Dominik Gmiterko","doi":"10.1145/3231644.3231676","DOIUrl":"https://doi.org/10.1145/3231644.3231676","url":null,"abstract":"A personalized learning system needs a large pool of items for learners to solve. When working with a large pool of items, it is useful to measure the similarity of items. We outline a general approach to measuring the similarity of items and discuss specific measures for items used in introductory programming. Evaluation of quality of similarity measures is difficult. To this end, we propose an evaluation approach utilizing three levels of abstraction. We illustrate our approach to measuring similarity and provide evaluation using items from three diverse programming environments.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"406 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76767775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective written communication is an essential skill which promotes educational success for undergraduates. However, undergraduate students, especially those in their first year at university, are unused to this form of writing. After their long experience with the schoolroom essay, for most undergraduates academic writing development is painstakingly slow. Thus, especially those with poor writing abilities, should write more to be better writers. Yet, the biggest impediment to more writing is that overburdened tutors would ask limited number of drafts from their students. Today, there exist powerful computational language technologies that could evaluate student writing, saving time and providing timely, speedy, reliable feedback which can support educators marking process. This paper motivates an updated visual analytics dashboard, XIPIt, to introduce a set of visual and writing analytics features embedded in a marking environment built on XIP output.
{"title":"XIPIt","authors":"Duygu Bektik","doi":"10.1145/3231644.3231696","DOIUrl":"https://doi.org/10.1145/3231644.3231696","url":null,"abstract":"Effective written communication is an essential skill which promotes educational success for undergraduates. However, undergraduate students, especially those in their first year at university, are unused to this form of writing. After their long experience with the schoolroom essay, for most undergraduates academic writing development is painstakingly slow. Thus, especially those with poor writing abilities, should write more to be better writers. Yet, the biggest impediment to more writing is that overburdened tutors would ask limited number of drafts from their students. Today, there exist powerful computational language technologies that could evaluate student writing, saving time and providing timely, speedy, reliable feedback which can support educators marking process. This paper motivates an updated visual analytics dashboard, XIPIt, to introduce a set of visual and writing analytics features embedded in a marking environment built on XIP output.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79349498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Aleven, J. Sewall, J. M. Andres, R. Sottilare, Rodney A. Long, R. Baker
Instruction that adapts to individual learner characteristics is often more effective than instruction that treats all learners as the same. A practical approach to making MOOCs adapt to learners may be by integrating frameworks for intelligent tutoring systems (ITSs). Using the Learning Tools Interoperability standard (LTI), we integrated two intelligent tutoring frameworks (GIFT and CTAT) into edX. We describe our initial explorations of four adaptive instructional patterns in the PennX MOOC "Big Data and Education." The work illustrates one route to adaptivity at scale.
{"title":"Towards adapting to learners at scale: integrating MOOC and intelligent tutoring frameworks","authors":"V. Aleven, J. Sewall, J. M. Andres, R. Sottilare, Rodney A. Long, R. Baker","doi":"10.1145/3231644.3231671","DOIUrl":"https://doi.org/10.1145/3231644.3231671","url":null,"abstract":"Instruction that adapts to individual learner characteristics is often more effective than instruction that treats all learners as the same. A practical approach to making MOOCs adapt to learners may be by integrating frameworks for intelligent tutoring systems (ITSs). Using the Learning Tools Interoperability standard (LTI), we integrated two intelligent tutoring frameworks (GIFT and CTAT) into edX. We describe our initial explorations of four adaptive instructional patterns in the PennX MOOC \"Big Data and Education.\" The work illustrates one route to adaptivity at scale.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73698514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study analyzes the use of paper exams in college-level STEM courses. It leverages a unique dataset of nearly 1,800 exams, which were scanned into a web application, then processed by a team of annotators to yield a detailed snapshot of the way instructors currently structure exams. The focus of the investigation is on the variety of question formats, and how they are applied across different course topics. The analysis divides questions according to seven top-level categories, finding significant differences among these in terms of positioning, use across subjects, and student performance. The analysis also reveals a strong tendency within the collection for instructors to order questions from easier to harder. A linear mixed effects model is used to estimate the reliability of different question types. Long writing questions stand out for their high reliability, while binary and multiple choice questions have low reliability. The model suggests that over three multiple choice questions, or over five binary questions, are required to attain the same reliability as a single long writing question. A correlation analysis across seven response types finds that student abilities for different questions types exceed 70 percent for all pairs, although binary and multiple-choice questions stand out for having unusually low correlations with all other question types.
{"title":"How do professors format exams?: an analysis of question variety at scale","authors":"Paul Laskowski, Sergey Karayev, Marti A. Hearst","doi":"10.1145/3231644.3231667","DOIUrl":"https://doi.org/10.1145/3231644.3231667","url":null,"abstract":"This study analyzes the use of paper exams in college-level STEM courses. It leverages a unique dataset of nearly 1,800 exams, which were scanned into a web application, then processed by a team of annotators to yield a detailed snapshot of the way instructors currently structure exams. The focus of the investigation is on the variety of question formats, and how they are applied across different course topics. The analysis divides questions according to seven top-level categories, finding significant differences among these in terms of positioning, use across subjects, and student performance. The analysis also reveals a strong tendency within the collection for instructors to order questions from easier to harder. A linear mixed effects model is used to estimate the reliability of different question types. Long writing questions stand out for their high reliability, while binary and multiple choice questions have low reliability. The model suggests that over three multiple choice questions, or over five binary questions, are required to attain the same reliability as a single long writing question. A correlation analysis across seven response types finds that student abilities for different questions types exceed 70 percent for all pairs, although binary and multiple-choice questions stand out for having unusually low correlations with all other question types.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76245218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sorathan Chaturapruek, T. Dee, Ramesh Johari, René F. Kizilcec, M. Stevens
College students rely on increasingly data-rich environments when making learning-relevant decisions about the courses they take and their expected time commitments. However, we know little about how their exposure to such data may influence student course choice, effort regulation, and performance. We conducted a large-scale field experiment in which all the undergraduates at a large, selective university were randomized to an encouragement to use a course-planning web application that integrates information from official transcripts from the past fifteen years with detailed end-of-course evaluation surveys. We found that use of the platform lowered students' GPA by 0.28 standard deviations on average. In a subsequent field experiment, we varied access to information about course grades and time commitment on the platform and found that access to grade information in particular lowered students' overall GPA. Our exploratory analysis suggests these effects are not due to changes in the portfolio of courses that students choose, but rather by changes to their behavior within courses.
{"title":"How a data-driven course planning tool affects college students' GPA: evidence from two field experiments","authors":"Sorathan Chaturapruek, T. Dee, Ramesh Johari, René F. Kizilcec, M. Stevens","doi":"10.1145/3231644.3231668","DOIUrl":"https://doi.org/10.1145/3231644.3231668","url":null,"abstract":"College students rely on increasingly data-rich environments when making learning-relevant decisions about the courses they take and their expected time commitments. However, we know little about how their exposure to such data may influence student course choice, effort regulation, and performance. We conducted a large-scale field experiment in which all the undergraduates at a large, selective university were randomized to an encouragement to use a course-planning web application that integrates information from official transcripts from the past fifteen years with detailed end-of-course evaluation surveys. We found that use of the platform lowered students' GPA by 0.28 standard deviations on average. In a subsequent field experiment, we varied access to information about course grades and time commitment on the platform and found that access to grade information in particular lowered students' overall GPA. Our exploratory analysis suggests these effects are not due to changes in the portfolio of courses that students choose, but rather by changes to their behavior within courses.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72801200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper applies theory and methodology from the learning design literature to large-scale learning environments through quantitative modeling of the structure and design of Massive Open Online Courses. For two institutions of higher education, we automate the task of encoding pedagogy and learning design principles for 177 courses (which accounted for for nearly 4 million enrollments). Course materials from these MOOCs are parsed and abstracted into sequences of components, such as videos and problems. Our key contributions are (i) describing the parsing and abstraction of courses for quantitative analyses, (ii) the automated categorization of similar course designs, and (iii) the identification of key structural components that show relationships between categories and learning design principles. We employ two methods to categorize similar course designs---one aimed at clustering courses using transition probabilities and another using trajectory mining. We then proceed with an exploratory analysis of relationships between our categorization and learning outcomes.
{"title":"Toward large-scale learning design: categorizing course designs in service of supporting learning outcomes","authors":"Dan Davis, Daniel T. Seaton, C. Hauff, G. Houben","doi":"10.1145/3231644.3231663","DOIUrl":"https://doi.org/10.1145/3231644.3231663","url":null,"abstract":"This paper applies theory and methodology from the learning design literature to large-scale learning environments through quantitative modeling of the structure and design of Massive Open Online Courses. For two institutions of higher education, we automate the task of encoding pedagogy and learning design principles for 177 courses (which accounted for for nearly 4 million enrollments). Course materials from these MOOCs are parsed and abstracted into sequences of components, such as videos and problems. Our key contributions are (i) describing the parsing and abstraction of courses for quantitative analyses, (ii) the automated categorization of similar course designs, and (iii) the identification of key structural components that show relationships between categories and learning design principles. We employ two methods to categorize similar course designs---one aimed at clustering courses using transition probabilities and another using trajectory mining. We then proceed with an exploratory analysis of relationships between our categorization and learning outcomes.","PeriodicalId":20634,"journal":{"name":"Proceedings of the Fifth Annual ACM Conference on Learning at Scale","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74630196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}