首页 > 最新文献

Journal of behavioral data science最新文献

英文 中文
Disentangling the Influence of Data Contamination in Growth Curve Modeling: A Median Based Bayesian Approach 分解生长曲线建模中数据污染的影响:一种基于中位数的贝叶斯方法
Pub Date : 2022-07-27 DOI: 10.35566/jbds/v2n2/p1
Tonghao Zhang, Xin Tong, Jianhui Zhou
Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability.
生长曲线模型(Growth curve models, GCMs)具有直接研究受试者内部随时间变化和受试者之间纵向数据变化差异的能力,在社会科学和行为科学中得到广泛应用。虽然gcm通常采用正态分布假设进行研究,但在实际应用中,经验数据经常违反正态分布假设。不考虑数据分布偏离正态可能导致不可靠的模型估计和误导性的统计推断。最近提出了一种基于条件中位数的鲁棒GCM,当存在异常值导致非正态性时,它优于传统的增长曲线模型。然而,当杠杆观测存在时,这种稳健的方法表现得不太令人满意。在这项工作中,我们提出了一种稳健的双中位数增长曲线建模方法(DOME GCM),以彻底摆脱数据污染对模型估计和推论的影响,其中两个条件中位数分别用于受试者内测量误差和随机效应的分布。在贝叶斯框架下进行模型估计和推理,利用拉普拉斯分布将中值估计的优化问题转化为对变换后的模型求最大似然估计量的问题。通过蒙特卡罗模拟研究来评估所提出方法的数值性能,结果表明,当数据包含异常值或利用观测值时,所提出的方法产生更准确和有效的参数估计。利用弗吉尼亚认知衰老项目的真实数据集来研究记忆能力的变化,说明了所开发的鲁棒方法的应用。
{"title":"Disentangling the Influence of Data Contamination in Growth Curve Modeling: A Median Based Bayesian Approach","authors":"Tonghao Zhang, Xin Tong, Jianhui Zhou","doi":"10.35566/jbds/v2n2/p1","DOIUrl":"https://doi.org/10.35566/jbds/v2n2/p1","url":null,"abstract":"Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45691343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Sample Size on Exchangeability in the Bayesian Synthesis Approach to Data Fusion 数据融合贝叶斯综合方法中样本大小对互换性的影响
Pub Date : 2022-07-26 DOI: 10.35566/jbds/v2n1/p5
Katerina M. Marcoulides, Jia Quan, Eric Wright
Data fusion approaches have been adopted to facilitate more complex analyses and produce more accurate results. Bayesian Synthesis is a relatively new approach to data fusion where results from the analysis of one dataset are used as prior information for the analysis of the next dataset. Datasets of interest are sequentially analyzed until a final posterior distribution is created, incorporating information from all candidate datasets, rather than simply combining the datasets into one large dataset and analyzing them simultaneously. One concern with this approach lies in the sequence of datasets being fused. This study examines whether the order of datasets matters when the datasets being fused each have substantially different sample sizes. The performance of Bayesian Synthesis with varied sample sizes is evaluated by examining results from simulated data with known population values under a variety of conditions. Results suggest that the order in which the dataset are fused can have a significant impact on the obtained estimates.
数据融合方法已被采用,以促进更复杂的分析和产生更准确的结果。贝叶斯合成是一种相对较新的数据融合方法,其中一个数据集的分析结果被用作下一个数据集分析的先验信息。顺序分析感兴趣的数据集,直到创建最终的后验分布,合并来自所有候选数据集的信息,而不是简单地将数据集组合成一个大数据集并同时分析它们。这种方法的一个问题在于融合数据集的顺序。本研究考察了当数据集被融合时,数据集的顺序是否重要,每个数据集都有本质上不同的样本量。通过在各种条件下对已知总体值的模拟数据的结果进行检验,评估了不同样本量下贝叶斯合成的性能。结果表明,数据集融合的顺序会对获得的估计产生重大影响。
{"title":"The Impact of Sample Size on Exchangeability in the Bayesian Synthesis Approach to Data Fusion","authors":"Katerina M. Marcoulides, Jia Quan, Eric Wright","doi":"10.35566/jbds/v2n1/p5","DOIUrl":"https://doi.org/10.35566/jbds/v2n1/p5","url":null,"abstract":"Data fusion approaches have been adopted to facilitate more complex analyses and produce more accurate results. Bayesian Synthesis is a relatively new approach to data fusion where results from the analysis of one dataset are used as prior information for the analysis of the next dataset. Datasets of interest are sequentially analyzed until a final posterior distribution is created, incorporating information from all candidate datasets, rather than simply combining the datasets into one large dataset and analyzing them simultaneously. One concern with this approach lies in the sequence of datasets being fused. This study examines whether the order of datasets matters when the datasets being fused each have substantially different sample sizes. The performance of Bayesian Synthesis with varied sample sizes is evaluated by examining results from simulated data with known population values under a variety of conditions. Results suggest that the order in which the dataset are fused can have a significant impact on the obtained estimates.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46505755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Book Review: An Introduction to Nonparametric Statistics 书评:非参数统计导论
Pub Date : 2022-07-05 DOI: 10.35566/jbds/v2n1/p8
Kévin Allan Sales Rodrigues
This is a brief comparative review of the book An Introduction to Nonparametric Statistics.
这是对《非参数统计导论》一书的简要比较回顾。
{"title":"Book Review: An Introduction to Nonparametric Statistics","authors":"Kévin Allan Sales Rodrigues","doi":"10.35566/jbds/v2n1/p8","DOIUrl":"https://doi.org/10.35566/jbds/v2n1/p8","url":null,"abstract":"This is a brief comparative review of the book An Introduction to Nonparametric Statistics.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46609117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Lighting of the BECONs 灯塔的照明
Pub Date : 2022-07-04 DOI: 10.35566/jbds/v2n1/p1
D. Borsboom, T. Blanken, F. Dablander, Frenk van Harreveld, C. Tanis, P. Van Mieghem
The imposition of lockdowns in response to the COVID-19 outbreak has underscored the importance of human behavior in mitigating virus transmission. The scientific study of interventions designed to change behavior (e.g., to promote physical distancing) requires measures of effectiveness that are fast, that can be assessed through experiments, and that can be investigated without actual virus transmission. This paper presents a methodological approach designed to deliver such indicators. We show how behavioral data, obtainable through wearable assessment devices or camera footage, can be used to assess the effect of interventions in experimental research; in addition, the approach can be extended to longitudinal data involving contact tracing apps. Our methodology operates by constructing a contact network: a representation that encodes which individuals have been in physical proximity long enough to transmit the virus. Because behavioral interventions alter the contact network, a comparison of contact networks before and after the intervention can provide information on the effectiveness of the intervention. We coin indicators based on this idea Behavioral Contact Network (BECON) indicators. We examine the performance of three indicators: the Density BECON, based on differences in network density; the Spectral BECON, based on differences in the eigenvector of the adjacency matrix; and the ASPL BECON, based on differences in average shortest path lengths. Using simulations, we show that all three indicators can effectively track the effect of behavioral interventions. Even in conditions with significant amounts of noise, BECON indicators can reliably identify and order effect sizes of interventions. The present paper invites further study of the method as well as practical implementations to test the validity of BECON indicators in real data.
为应对COVID-19疫情而实施的封锁凸显了人类行为对减轻病毒传播的重要性。对旨在改变行为(例如促进保持身体距离)的干预措施进行科学研究,需要采取快速、可通过实验进行评估、可在没有实际病毒传播的情况下进行调查的有效性措施。本文提出了一种旨在提供此类指标的方法学方法。我们展示了如何通过可穿戴评估设备或摄像机镜头获取行为数据,以评估实验研究中干预措施的效果;此外,该方法还可以扩展到涉及接触追踪应用程序的纵向数据。我们的方法是通过构建一个接触网络来运作的:这是一个代表,它编码了哪些人在物理上接近的时间足够长,可以传播病毒。由于行为干预改变了接触网络,比较干预前后的接触网络可以提供干预有效性的信息。我们基于这个想法创造了行为接触网络(BECON)指标。我们考察了三个指标的性能:基于网络密度差异的密度BECON;基于邻接矩阵特征向量差异的谱BECON;以及基于平均最短路径长度差异的ASPL BECON。通过模拟,我们发现这三个指标都可以有效地跟踪行为干预的效果。即使在有大量噪声的条件下,BECON指标也可以可靠地识别和排序干预措施的效果大小。本文邀请进一步研究该方法以及在实际数据中检验BECON指标有效性的实际实施。
{"title":"The Lighting of the BECONs","authors":"D. Borsboom, T. Blanken, F. Dablander, Frenk van Harreveld, C. Tanis, P. Van Mieghem","doi":"10.35566/jbds/v2n1/p1","DOIUrl":"https://doi.org/10.35566/jbds/v2n1/p1","url":null,"abstract":"\u0000 \u0000 \u0000The imposition of lockdowns in response to the COVID-19 outbreak has underscored the importance of human behavior in mitigating virus transmission. The scientific study of interventions designed to change behavior (e.g., to promote physical distancing) requires measures of effectiveness that are fast, that can be assessed through experiments, and that can be investigated without actual virus transmission. This paper presents a methodological approach designed to deliver such indicators. We show how behavioral data, obtainable through wearable assessment devices or camera footage, can be used to assess the effect of interventions in experimental research; in addition, the approach can be extended to longitudinal data involving contact tracing apps. Our methodology operates by constructing a contact network: a representation that encodes which individuals have been in physical proximity long enough to transmit the virus. Because behavioral interventions alter the contact network, a comparison of contact networks before and after the intervention can provide information on the effectiveness of the intervention. We coin indicators based on this idea Behavioral Contact Network (BECON) indicators. We examine the performance of three indicators: the Density BECON, based on differences in network density; the Spectral BECON, based on differences in the eigenvector of the adjacency matrix; and the ASPL BECON, based on differences in average shortest path lengths. Using simulations, we show that all three indicators can effectively track the effect of behavioral interventions. Even in conditions with significant amounts of noise, BECON indicators can reliably identify and order effect sizes of interventions. The present paper invites further study of the method as well as practical implementations to test the validity of BECON indicators in real data. \u0000 \u0000 \u0000","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43762000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How to Select the Best Fit Model among Bayesian Latent Growth Models for Complex Data 如何在复杂数据的贝叶斯潜在增长模型中选择最佳拟合模型
Pub Date : 2022-06-23 DOI: 10.35566/jbds/v2n1/p2
Laura Lu, Zhiyong Zhang
Bayesian approach is becoming increasingly important as it provides many advantages in dealing with complex data. However, there is no well-defined model selection criterion or index in a Bayesian context. To address the challenges, new indices are needed. The goal of this study is to propose new model selection indices and to investigate their performances in the framework of latent growth mixture models with missing data and outliers in a Bayesian context. We consider latent growth models because they are very flexible in modeling complex data and becoming increasingly popular in statistical, psychological, behavioral, and educational areas. Specifically, this study conducted five simulation studies to cover different cases, including latent growth curve models with missing data, latent growth curve models with missing data and outliers, growth mixture models with missing data and outliers, extended growth mixture models with missing data and outliers, and latent growth models with different classes. Simulation results show that almost all proposed indices can effectively identify the true model. This study also illustrated the application of these model selection indices in real data analysis.
贝叶斯方法越来越重要,因为它在处理复杂数据时提供了许多优势。然而,在贝叶斯环境中没有定义明确的模型选择标准或指数。为了应对这些挑战,需要新的指数。本研究的目标是提出新的模型选择指数,并在贝叶斯环境下研究其在具有缺失数据和异常值的潜在增长混合模型框架中的性能。我们考虑潜在增长模型,因为它们在建模复杂数据方面非常灵活,在统计、心理、行为和教育领域越来越受欢迎。具体而言,本研究进行了五项模拟研究,涵盖了不同的情况,包括具有缺失数据的潜在增长曲线模型、具有缺失数据和异常值的潜在增长线模型、具有遗漏数据和异常点的增长混合模型、具有缺失数据和异常的扩展增长混合模型以及不同类别的潜在增长模型。仿真结果表明,几乎所有提出的指标都能有效地识别真实模型。本研究还说明了这些模型选择指标在实际数据分析中的应用。
{"title":"How to Select the Best Fit Model among Bayesian Latent Growth Models for Complex Data","authors":"Laura Lu, Zhiyong Zhang","doi":"10.35566/jbds/v2n1/p2","DOIUrl":"https://doi.org/10.35566/jbds/v2n1/p2","url":null,"abstract":"Bayesian approach is becoming increasingly important as it provides many advantages in dealing with complex data. However, there is no well-defined model selection criterion or index in a Bayesian context. To address the challenges, new indices are needed. The goal of this study is to propose new model selection indices and to investigate their performances in the framework of latent growth mixture models with missing data and outliers in a Bayesian context. We consider latent growth models because they are very flexible in modeling complex data and becoming increasingly popular in statistical, psychological, behavioral, and educational areas. Specifically, this study conducted five simulation studies to cover different cases, including latent growth curve models with missing data, latent growth curve models with missing data and outliers, growth mixture models with missing data and outliers, extended growth mixture models with missing data and outliers, and latent growth models with different classes. Simulation results show that almost all proposed indices can effectively identify the true model. This study also illustrated the application of these model selection indices in real data analysis.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47074931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Does Minority Case Sampling Improve Performance with Imbalanced Outcomes in Psychological Research? 少数族裔案例抽样是否能在心理研究结果不平衡的情况下提高绩效?
Pub Date : 2022-06-15 DOI: 10.35566/jbds/v2n1/p3
R. Jacobucci, Xiaobei Li
In psychological research, class imbalance in binary outcome variables is a common occurrence, particularly in clinical variables (e.g., suicide outcomes). Class imbalance can present a number of difficulties for inference and prediction, prompting the development of a number of strategies that perform data augmentation through random sampling from just the positive cases, or from both the positive and negative cases. Through evaluation in benchmark datasets from computer science, these methods have shown marked improvements in predictive performance when the outcome is imbalanced. However, questions remain regarding generalizability to psychological data. To study this, we implemented a simulation study that tests a number of popular sampling strategies implemented in easy-to-use software, as well as in an empirical example focusing on the prediction of suicidal thoughts. In general, we found that while one sampling strategy demonstrated far worse performance even in comparison to no sampling, the other sampling methods performed similarly, evidencing slight improvements over no sampling. Further, we evaluated the sampling strategies across different forms of cross-validation, model fit metrics, and machine learning algorithms.
在心理学研究中,二元结果变量中的阶级失衡是一种常见现象,尤其是在临床变量(如自杀结果)中。类别不平衡可能会给推理和预测带来许多困难,促使开发出许多策略,通过仅从阳性病例或从阳性和阴性病例中随机抽样来执行数据扩充。通过对计算机科学基准数据集的评估,当结果不平衡时,这些方法的预测性能有了显著改善。然而,心理数据的可推广性仍然存在问题。为了研究这一点,我们进行了一项模拟研究,测试了在易于使用的软件中实施的一些流行的采样策略,以及一个专注于自杀想法预测的实证例子。总的来说,我们发现,虽然一种采样策略的性能甚至比不采样差得多,但其他采样方法的性能相似,与不采样相比略有改善。此外,我们评估了不同形式的交叉验证、模型拟合指标和机器学习算法的采样策略。
{"title":"Does Minority Case Sampling Improve Performance with Imbalanced Outcomes in Psychological Research?","authors":"R. Jacobucci, Xiaobei Li","doi":"10.35566/jbds/v2n1/p3","DOIUrl":"https://doi.org/10.35566/jbds/v2n1/p3","url":null,"abstract":"In psychological research, class imbalance in binary outcome variables is a common occurrence, particularly in clinical variables (e.g., suicide outcomes). Class imbalance can present a number of difficulties for inference and prediction, prompting the development of a number of strategies that perform data augmentation through random sampling from just the positive cases, or from both the positive and negative cases. Through evaluation in benchmark datasets from computer science, these methods have shown marked improvements in predictive performance when the outcome is imbalanced. However, questions remain regarding generalizability to psychological data. To study this, we implemented a simulation study that tests a number of popular sampling strategies implemented in easy-to-use software, as well as in an empirical example focusing on the prediction of suicidal thoughts. In general, we found that while one sampling strategy demonstrated far worse performance even in comparison to no sampling, the other sampling methods performed similarly, evidencing slight improvements over no sampling. Further, we evaluated the sampling strategies across different forms of cross-validation, model fit metrics, and machine learning algorithms.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49410087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Role of Personality in Trust in Public Policy Automation 公共政策自动化中信任的人格作用
Pub Date : 2022-05-11 DOI: 10.35566/jbds/v2n1/p4/
Philip D. Waggoner, Ryan Kennedy
Algorithms play an increasingly important role in public policy decision-making. Despite this consequential role, little effort has been made to evaluate the extent to which people trust algorithms in decision-making, much less the personality characteristics associated with higher levels of trust. Such evaluations inform the widespread adoption and efficacy of algorithms in public policy decision-making. We explore the role of major personality inventories -- need for cognition, need to evaluate, the "Big 5" -- in shaping an individual's trust in public policy algorithms, specifically dealing with criminal justice sentencing. Through an original survey experiment, we find strong correlations between all personality types and general levels of trust in automation, as expected. Further, we uncovered evidence that need for cognition increases the weight given to advice from an algorithm relative to humans, and "agreeableness" decreases the distance between respondents' expectations and advice from a judge, relative to advice from a crowd.
算法在公共政策决策中发挥着越来越重要的作用。尽管有这样重要的作用,但很少有人去评估人们在决策过程中对算法的信任程度,更不用说与较高信任水平相关的人格特征了。这些评估为公共政策决策中算法的广泛采用和有效性提供了信息。我们探讨了主要人格清单的作用——认知需求、评估需求、“五大”——在塑造个人对公共政策算法的信任方面,特别是在处理刑事司法量刑方面。通过一项原始的调查实验,我们发现所有人格类型和对自动化的一般信任水平之间存在很强的相关性,正如预期的那样。此外,我们发现的证据表明,相对于人类,认知需求增加了算法建议的权重,而相对于来自人群的建议,“宜人性”减少了受访者的期望与法官建议之间的距离。
{"title":"The Role of Personality in Trust in Public Policy Automation","authors":"Philip D. Waggoner, Ryan Kennedy","doi":"10.35566/jbds/v2n1/p4/","DOIUrl":"https://doi.org/10.35566/jbds/v2n1/p4/","url":null,"abstract":"Algorithms play an increasingly important role in public policy decision-making. Despite this consequential role, little effort has been made to evaluate the extent to which people trust algorithms in decision-making, much less the personality characteristics associated with higher levels of trust. Such evaluations inform the widespread adoption and efficacy of algorithms in public policy decision-making. We explore the role of major personality inventories -- need for cognition, need to evaluate, the \"Big 5\" -- in shaping an individual's trust in public policy algorithms, specifically dealing with criminal justice sentencing. Through an original survey experiment, we find strong correlations between all personality types and general levels of trust in automation, as expected. Further, we uncovered evidence that need for cognition increases the weight given to advice from an algorithm relative to humans, and \"agreeableness\" decreases the distance between respondents' expectations and advice from a judge, relative to advice from a crowd.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48576186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Weighted Residual Bootstrap Method for Multilevel Modeling with Sampling Weights 带采样权的多层建模的加权残差自举法
Pub Date : 2021-12-02 DOI: 10.35566/jbds/v1n2/p6
Wen Luo, Hok Chio Lai
Multilevel modeling is often used to analyze survey data collected with a multistage sampling design. When the selection is informative, sampling weights need to be incorporated in the estimation. We propose a weighted residual bootstrap method as an alternative to the multilevel pseudo-maximum likelihood (MPML) estimators. In a Monte Carlo simulation using two-level linear mixed effects models, the bootstrap method showed advantages over MPML for the estimates and the statistical inferences of the intercept, the slope of the level-2 predictor, and the variance components at level-2. The impact of sample size, selection mechanism, intraclass correlation (ICC), and distributional assumptions on the performance of the methods were examined. The performance of MPML was suboptimal when sample size and ICC were small and when the normality assumption was violated. The bootstrap estimates performed generally well across all the simulation conditions, but had notably suboptimal performance in estimating the covariance component in a random slopes model when sample size and ICCs were large. As an illustration, the bootstrap method is applied to the American data of the OECD’s Program for International Students Assessment (PISA) survey on math achievement using the R package bootmlm.
多级建模通常用于分析通过多级抽样设计收集的调查数据。当选择具有信息性时,需要在估计中加入采样权重。我们提出了一种加权残差自举方法作为多级伪最大似然(MPML)估计量的替代方法。在使用两级线性混合效应模型的蒙特卡罗模拟中,bootstrap方法在截距、二级预测器的斜率和二级方差分量的估计和统计推断方面显示出优于MPML的优势。研究了样本量、选择机制、类内相关性(ICC)和分布假设对方法性能的影响。当样本量和ICC较小以及违反正态性假设时,MPML的性能是次优的。bootstrap估计在所有模拟条件下通常表现良好,但当样本量和ICCs较大时,在估计随机斜率模型中的协方差分量时具有明显的次优性能。举个例子,bootstrap方法被应用于经合组织国际学生评估计划(PISA)使用R包bootmlm进行的数学成绩调查的美国数据。
{"title":"A Weighted Residual Bootstrap Method for Multilevel Modeling with Sampling Weights","authors":"Wen Luo, Hok Chio Lai","doi":"10.35566/jbds/v1n2/p6","DOIUrl":"https://doi.org/10.35566/jbds/v1n2/p6","url":null,"abstract":"Multilevel modeling is often used to analyze survey data collected with a multistage sampling design. When the selection is informative, sampling weights need to be incorporated in the estimation. We propose a weighted residual bootstrap method as an alternative to the multilevel pseudo-maximum likelihood (MPML) estimators. In a Monte Carlo simulation using two-level linear mixed effects models, the bootstrap method showed advantages over MPML for the estimates and the statistical inferences of the intercept, the slope of the level-2 predictor, and the variance components at level-2. The impact of sample size, selection mechanism, intraclass correlation (ICC), and distributional assumptions on the performance of the methods were examined. The performance of MPML was suboptimal when sample size and ICC were small and when the normality assumption was violated. The bootstrap estimates performed generally well across all the simulation conditions, but had notably suboptimal performance in estimating the covariance component in a random slopes model when sample size and ICCs were large. As an illustration, the bootstrap method is applied to the American data of the OECD’s Program for International Students Assessment (PISA) survey on math achievement using the R package bootmlm.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46484556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Equation Modeling using Stata 使用Stata进行结构方程建模
Pub Date : 2021-12-02 DOI: 10.35566/jbds/v1n2/p7
Meghan K Cain
In this tutorial, you will learn how to fit structural equation models (SEM) using Stata software. SEMs can be fit in Stata using the sem command for standard linear SEMs, the gsem command for generalized linear SEMs, or by drawing their path diagrams in the SEM Builder. After a brief introduction to Stata, the sem command will be demonstrated through a confirmatory factor analysis model, mediation model, group analysis, and a growth curve model, and the gsem command will be demonstrated through a random-slope model and a logistic ordinal regression. Materials and datasets are provided online, allowing anyone with Stata to follow along.
在本教程中,您将学习如何使用Stata软件拟合结构方程模型(SEM)。sem可以在Stata中使用标准线性sem的sem命令、广义线性sem的gsem命令,或者通过在sem生成器中绘制其路径图来拟合。在简要介绍Stata后,将通过验证性因素分析模型、中介模型、群体分析和增长曲线模型来演示sem命令,并通过随机斜率模型和逻辑有序回归来演示gsem命令。材料和数据集在线提供,任何有Stata的人都可以跟随。
{"title":"Structural Equation Modeling using Stata","authors":"Meghan K Cain","doi":"10.35566/jbds/v1n2/p7","DOIUrl":"https://doi.org/10.35566/jbds/v1n2/p7","url":null,"abstract":"In this tutorial, you will learn how to fit structural equation models (SEM) using Stata software. SEMs can be fit in Stata using the sem command for standard linear SEMs, the gsem command for generalized linear SEMs, or by drawing their path diagrams in the SEM Builder. After a brief introduction to Stata, the sem command will be demonstrated through a confirmatory factor analysis model, mediation model, group analysis, and a growth curve model, and the gsem command will be demonstrated through a random-slope model and a logistic ordinal regression. Materials and datasets are provided online, allowing anyone with Stata to follow along.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49546681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
GPS2space: An Open-source Python Library for Spatial Measure Extraction from GPS Data. GPS2space:用于从GPS数据中提取空间测量的开源Python库。
Pub Date : 2021-11-08 DOI: 10.35566/jbds/v1n2/p5
Shuai Zhou, Yanling Li, G. Chi, Junjun Yin, Zita Oravecz, Yosef Bodovski, N. Friedman, S. Vrieze, Sy-Miin Chow
Global Positioning System (GPS) data have become one of the routine data streams collected by wearable devices, cell phones, and social media platforms in this digital age. Such data provide research opportunities in that they may provide contextual information to elucidate where, when, and why individuals engage in and sustain particular behavioral patterns. However, raw GPS data consisting of densely sampled time series of latitude and longitude coordinate pairs do not readily convey meaningful information concerning intra-individual dynamics and inter-individual differences; substantial data processing is required. Raw GPS data need to be integrated into a Geographic Information System (GIS) and analyzed, from which the mobility and activity patterns of individuals can be derived, a process that is unfamiliar to many behavioral scientists. In this tutorial article, we introduced GPS2space, a free and open-source Python library that we developed to facilitate the processing of GPS data, integration with GIS to derive distances from landmarks of interest, as well as extraction of two spatial features: activity space of individuals and shared space between individuals, such as members of the same family. We demonstrated functions available in the library using data from the Colorado Online Twin Study to explore seasonal and age-related changes in individuals' activity space and twin siblings' shared space, as well as gender, zygosity and baseline age-related differences in their initial levels and/or changes over time. We concluded with discussions of other potential usages, caveats, and future developments of GPS2space.
在这个数字时代,全球定位系统(GPS)数据已经成为可穿戴设备、手机和社交媒体平台收集的常规数据流之一。这些数据提供了研究机会,因为它们可以提供上下文信息,以阐明个人在何时、何地以及为何从事并维持特定的行为模式。然而,由密集采样的经纬度坐标对时间序列组成的原始GPS数据不容易传达有关个体内部动态和个体间差异的有意义的信息;需要大量的数据处理。原始的GPS数据需要整合到地理信息系统(GIS)中并进行分析,从中可以得出个人的流动性和活动模式,这是许多行为科学家不熟悉的过程。在这篇教程文章中,我们介绍了GPS2space,这是我们开发的一个免费的开源Python库,用于促进GPS数据的处理,与GIS集成以从感兴趣的地标获取距离,以及提取两个空间特征:个体的活动空间和个体之间的共享空间,例如同一家族的成员。我们使用科罗拉多在线双胞胎研究的数据展示了图书馆中可用的功能,以探索个体活动空间和双胞胎兄弟姐妹共享空间的季节性和年龄相关变化,以及性别、合子性和基线年龄相关的初始水平差异和/或随时间变化。最后,我们讨论了GPS2space的其他潜在用途、注意事项和未来发展。
{"title":"GPS2space: An Open-source Python Library for Spatial Measure Extraction from GPS Data.","authors":"Shuai Zhou, Yanling Li, G. Chi, Junjun Yin, Zita Oravecz, Yosef Bodovski, N. Friedman, S. Vrieze, Sy-Miin Chow","doi":"10.35566/jbds/v1n2/p5","DOIUrl":"https://doi.org/10.35566/jbds/v1n2/p5","url":null,"abstract":"Global Positioning System (GPS) data have become one of the routine data streams collected by wearable devices, cell phones, and social media platforms in this digital age. Such data provide research opportunities in that they may provide contextual information to elucidate where, when, and why individuals engage in and sustain particular behavioral patterns. However, raw GPS data consisting of densely sampled time series of latitude and longitude coordinate pairs do not readily convey meaningful information concerning intra-individual dynamics and inter-individual differences; substantial data processing is required. Raw GPS data need to be integrated into a Geographic Information System (GIS) and analyzed, from which the mobility and activity patterns of individuals can be derived, a process that is unfamiliar to many behavioral scientists. In this tutorial article, we introduced GPS2space, a free and open-source Python library that we developed to facilitate the processing of GPS data, integration with GIS to derive distances from landmarks of interest, as well as extraction of two spatial features: activity space of individuals and shared space between individuals, such as members of the same family. We demonstrated functions available in the library using data from the Colorado Online Twin Study to explore seasonal and age-related changes in individuals' activity space and twin siblings' shared space, as well as gender, zygosity and baseline age-related differences in their initial levels and/or changes over time. We concluded with discussions of other potential usages, caveats, and future developments of GPS2space.","PeriodicalId":93575,"journal":{"name":"Journal of behavioral data science","volume":"1 2 1","pages":"127-155"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47864578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Journal of behavioral data science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1