{"title":"量化数据敏感性:在建立学生预测模型时精确地展示了谨慎","authors":"Charles Lang, Charlotte Woo, Jeanne Sinclair","doi":"10.1145/3375462.3375506","DOIUrl":null,"url":null,"abstract":"Until recently an assumption within the predictive modelling community has been that collecting more student data is always better. But in reaction to recent high profile data privacy scandals, many educators, scholars, students and administrators have been questioning the ethics of such a strategy. Suggestions are growing that the minimum amount of data should be collected to aid the function for which a prediction is being made. Yet, machine learning algorithms are primarily judged on metrics derived from prediction accuracy or whether they meet probabilistic criteria for significance. They are not routinely judged on whether they utilize the minimum number of the least sensitive features, preserving what we name here as data collection parsimony. We believe the ability to assess data collection parsimony would be a valuable addition to the suite of evaluations for any prediction strategy and to that end, the following paper provides an introduction to data collection parsimony, describes a novel method for quantifying the concept using empirical Bayes estimates and then tests the metric on real world data. Both theoretical and empirical benefits and limitations of this method are discussed. We conclude that for the purpose of model building this metric is superior to others in several ways, but there are some hurdles to effective implementation.","PeriodicalId":355800,"journal":{"name":"Proceedings of the Tenth International Conference on Learning Analytics & Knowledge","volume":"291 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Quantifying data sensitivity: precise demonstration of care when building student prediction models\",\"authors\":\"Charles Lang, Charlotte Woo, Jeanne Sinclair\",\"doi\":\"10.1145/3375462.3375506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Until recently an assumption within the predictive modelling community has been that collecting more student data is always better. But in reaction to recent high profile data privacy scandals, many educators, scholars, students and administrators have been questioning the ethics of such a strategy. Suggestions are growing that the minimum amount of data should be collected to aid the function for which a prediction is being made. Yet, machine learning algorithms are primarily judged on metrics derived from prediction accuracy or whether they meet probabilistic criteria for significance. They are not routinely judged on whether they utilize the minimum number of the least sensitive features, preserving what we name here as data collection parsimony. We believe the ability to assess data collection parsimony would be a valuable addition to the suite of evaluations for any prediction strategy and to that end, the following paper provides an introduction to data collection parsimony, describes a novel method for quantifying the concept using empirical Bayes estimates and then tests the metric on real world data. Both theoretical and empirical benefits and limitations of this method are discussed. We conclude that for the purpose of model building this metric is superior to others in several ways, but there are some hurdles to effective implementation.\",\"PeriodicalId\":355800,\"journal\":{\"name\":\"Proceedings of the Tenth International Conference on Learning Analytics & Knowledge\",\"volume\":\"291 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Tenth International Conference on Learning Analytics & Knowledge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3375462.3375506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth International Conference on Learning Analytics & Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3375462.3375506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quantifying data sensitivity: precise demonstration of care when building student prediction models
Until recently an assumption within the predictive modelling community has been that collecting more student data is always better. But in reaction to recent high profile data privacy scandals, many educators, scholars, students and administrators have been questioning the ethics of such a strategy. Suggestions are growing that the minimum amount of data should be collected to aid the function for which a prediction is being made. Yet, machine learning algorithms are primarily judged on metrics derived from prediction accuracy or whether they meet probabilistic criteria for significance. They are not routinely judged on whether they utilize the minimum number of the least sensitive features, preserving what we name here as data collection parsimony. We believe the ability to assess data collection parsimony would be a valuable addition to the suite of evaluations for any prediction strategy and to that end, the following paper provides an introduction to data collection parsimony, describes a novel method for quantifying the concept using empirical Bayes estimates and then tests the metric on real world data. Both theoretical and empirical benefits and limitations of this method are discussed. We conclude that for the purpose of model building this metric is superior to others in several ways, but there are some hurdles to effective implementation.