Imputation methods for mixed datasets in bioarchaeology

IF 2.1 2区 地球科学 Q1 ANTHROPOLOGY Archaeological and Anthropological Sciences Pub Date : 2024-10-23 DOI:10.1007/s12520-024-02078-2
Jessica Ryan-Despraz, Amanda Wissler
{"title":"Imputation methods for mixed datasets in bioarchaeology","authors":"Jessica Ryan-Despraz,&nbsp;Amanda Wissler","doi":"10.1007/s12520-024-02078-2","DOIUrl":null,"url":null,"abstract":"<div><p>Missing data is a prevalent problem in bioarchaeological research and imputation could provide a promising solution. This work simulated missingness on a control dataset (481 samples × 41 variables) in order to explore imputation methods for mixed data (qualitative and quantitative data). The tested methods included Random Forest (RF), PCA/MCA, factorial analysis for mixed data (FAMD), hotdeck, predictive mean matching (PMM), random samples from observed values (RSOV), and a multi-method (MM) approach for the three missingness mechanisms (MCAR, MAR, and MNAR) at levels of 5%, 10%, 20%, 30%, and 40% missingness. This study also compared single imputation with an adapted multiple imputation method derived from the R package “mice”. The results showed that the adapted multiple imputation technique always outperformed single imputation for the same method. The best performing methods were most often RF and MM, and other commonly successful methods were PCA/MCA and PMM multiple imputation. Across all criteria, the amount of missingness was the most important parameter for imputation accuracy. While this study found that some imputation methods performed better than others for the control dataset, each imputation method has advantages and disadvantages. Imputation remains a promising solution for datasets containing missingness; however when making a decision it is essential to consider dataset structure and research goals.</p></div>","PeriodicalId":8214,"journal":{"name":"Archaeological and Anthropological Sciences","volume":"16 11","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11496361/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archaeological and Anthropological Sciences","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s12520-024-02078-2","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Missing data is a prevalent problem in bioarchaeological research and imputation could provide a promising solution. This work simulated missingness on a control dataset (481 samples × 41 variables) in order to explore imputation methods for mixed data (qualitative and quantitative data). The tested methods included Random Forest (RF), PCA/MCA, factorial analysis for mixed data (FAMD), hotdeck, predictive mean matching (PMM), random samples from observed values (RSOV), and a multi-method (MM) approach for the three missingness mechanisms (MCAR, MAR, and MNAR) at levels of 5%, 10%, 20%, 30%, and 40% missingness. This study also compared single imputation with an adapted multiple imputation method derived from the R package “mice”. The results showed that the adapted multiple imputation technique always outperformed single imputation for the same method. The best performing methods were most often RF and MM, and other commonly successful methods were PCA/MCA and PMM multiple imputation. Across all criteria, the amount of missingness was the most important parameter for imputation accuracy. While this study found that some imputation methods performed better than others for the control dataset, each imputation method has advantages and disadvantages. Imputation remains a promising solution for datasets containing missingness; however when making a decision it is essential to consider dataset structure and research goals.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生物考古学中混合数据集的估算方法。
缺失数据是生物考古学研究中普遍存在的问题,而估算可以提供一种很有前景的解决方案。这项工作模拟了对照数据集(481 个样本 × 41 个变量)的缺失情况,以探索混合数据(定性和定量数据)的估算方法。测试的方法包括随机森林(RF)、PCA/MCA、混合数据因子分析(FAMD)、hotdeck、预测均值匹配(PMM)、观测值随机样本(RSOV)以及针对三种缺失机制(MCAR、MAR 和 MNAR)的多方法(MM)方法,缺失水平分别为 5%、10%、20%、30% 和 40%。本研究还比较了单一估算法和源自 R 软件包 "mice "的改编多重估算法。结果表明,在相同的方法中,改编的多重归因技术总是优于单一归因方法。表现最好的方法通常是 RF 和 MM,其他常见的成功方法是 PCA/MCA 和 PMM 多重估算。在所有标准中,缺失量是影响估算准确性的最重要参数。本研究发现,一些估算方法在对照数据集上的表现优于其他方法,但每种估算方法都各有利弊。对于含有缺失的数据集来说,估算仍然是一种很有前景的解决方案;但是,在做出决定时,必须考虑数据集结构和研究目标:在线版本包含补充材料,可查阅 10.1007/s12520-024-02078-2。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Archaeological and Anthropological Sciences
Archaeological and Anthropological Sciences GEOSCIENCES, MULTIDISCIPLINARY-
CiteScore
4.80
自引率
18.20%
发文量
199
期刊介绍: Archaeological and Anthropological Sciences covers the full spectrum of natural scientific methods with an emphasis on the archaeological contexts and the questions being studied. It bridges the gap between archaeologists and natural scientists providing a forum to encourage the continued integration of scientific methodologies in archaeological research. Coverage in the journal includes: archaeology, geology/geophysical prospection, geoarchaeology, geochronology, palaeoanthropology, archaeozoology and archaeobotany, genetics and other biomolecules, material analysis and conservation science. The journal is endorsed by the German Society of Natural Scientific Archaeology and Archaeometry (GNAA), the Hellenic Society for Archaeometry (HSC), the Association of Italian Archaeometrists (AIAr) and the Society of Archaeological Sciences (SAS).
期刊最新文献
Epigravettian barbed points from Vlakno cave (Croatia): the earliest evidence for barbed point technology in the Adriatic Tracing metallurgical links and silver provenance in Balkan coinage (5th -1st centuries BCE) Multi-analysis technique researches the painting materials and technics of polychrome arhat statue in Lingyan Temple, Shandong Province, China Detecting the waves of southward culture diffusion along the eastern margin of Tibetan Plateau during the Neolithic and Bronze Age: a sarcophagus burial perspective Algorithms for biodistance analysis based on various squared Euclidean and generalized Mahalanobis distances combined with probabilistic hierarchical cluster analysis and multidimensional scaling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1