基于机器学习的Lithic小额借记识别

IF 1.9 2区 历史学 0 ARCHAEOLOGY Advances in Archaeological Practice Pub Date : 2023-02-16 DOI:10.1017/aap.2022.35
Markus Eberl, Charreau S. Bell, Jesse Spencer-Smith, M. Raj, Amanda Sarubbi, Phyllis S. Johnson, Amy E. Rieth, Umang Chaudhry, Rebecca Estrada Aguila, Michael McBride
{"title":"基于机器学习的Lithic小额借记识别","authors":"Markus Eberl, Charreau S. Bell, Jesse Spencer-Smith, M. Raj, Amanda Sarubbi, Phyllis S. Johnson, Amy E. Rieth, Umang Chaudhry, Rebecca Estrada Aguila, Michael McBride","doi":"10.1017/aap.2022.35","DOIUrl":null,"url":null,"abstract":"ABSTRACT Archaeologists tend to produce slow data that is contextually rich but often difficult to generalize. An example is the analysis of lithic microdebitage, or knapping debris, that is smaller than 6.3 mm (0.25 in.). So far, scholars have relied on manual approaches that are prone to intra- and interobserver errors. In the following, we present a machine learning–based alternative together with experimental archaeology and dynamic image analysis. We use a dynamic image particle analyzer to measure each particle in experimentally produced lithic microdebitage (N = 5,299) as well as an archaeological soil sample (N = 73,313). We have developed four machine learning models based on Naïve Bayes, glmnet (generalized linear regression), random forest, and XGBoost (“Extreme Gradient Boost[ing]”) algorithms. Hyperparameter tuning optimized each model. A random forest model performed best with a sensitivity of 83.5%. It misclassified only 28 or 0.9% of lithic microdebitage. XGBoost models reached a sensitivity of 67.3%, whereas Naïve Bayes and glmnet models stayed below 50%. Except for glmnet models, transparency proved to be the most critical variable to distinguish microdebitage. Our approach objectifies and standardizes microdebitage analysis. Machine learning allows studying much larger sample sizes. Algorithms differ, though, and a random forest model offers the best performance so far.","PeriodicalId":7231,"journal":{"name":"Advances in Archaeological Practice","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Machine Learning–Based Identification of Lithic Microdebitage\",\"authors\":\"Markus Eberl, Charreau S. Bell, Jesse Spencer-Smith, M. Raj, Amanda Sarubbi, Phyllis S. Johnson, Amy E. Rieth, Umang Chaudhry, Rebecca Estrada Aguila, Michael McBride\",\"doi\":\"10.1017/aap.2022.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Archaeologists tend to produce slow data that is contextually rich but often difficult to generalize. An example is the analysis of lithic microdebitage, or knapping debris, that is smaller than 6.3 mm (0.25 in.). So far, scholars have relied on manual approaches that are prone to intra- and interobserver errors. In the following, we present a machine learning–based alternative together with experimental archaeology and dynamic image analysis. We use a dynamic image particle analyzer to measure each particle in experimentally produced lithic microdebitage (N = 5,299) as well as an archaeological soil sample (N = 73,313). We have developed four machine learning models based on Naïve Bayes, glmnet (generalized linear regression), random forest, and XGBoost (“Extreme Gradient Boost[ing]”) algorithms. Hyperparameter tuning optimized each model. A random forest model performed best with a sensitivity of 83.5%. It misclassified only 28 or 0.9% of lithic microdebitage. XGBoost models reached a sensitivity of 67.3%, whereas Naïve Bayes and glmnet models stayed below 50%. Except for glmnet models, transparency proved to be the most critical variable to distinguish microdebitage. Our approach objectifies and standardizes microdebitage analysis. Machine learning allows studying much larger sample sizes. Algorithms differ, though, and a random forest model offers the best performance so far.\",\"PeriodicalId\":7231,\"journal\":{\"name\":\"Advances in Archaeological Practice\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Archaeological Practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/aap.2022.35\",\"RegionNum\":2,\"RegionCategory\":\"历史学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"ARCHAEOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Archaeological Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/aap.2022.35","RegionNum":2,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ARCHAEOLOGY","Score":null,"Total":0}
引用次数: 1

摘要

摘要考古学家倾向于产生缓慢的数据,这些数据具有丰富的背景,但往往难以概括。一个例子是对小于6.3毫米(0.25英寸)的岩屑微挖或凿碎碎片的分析。到目前为止,学者们一直依赖于容易出现观察者内部和观察者之间错误的手动方法。在下文中,我们提出了一种基于机器学习的替代方案,以及实验考古学和动态图像分析。我们使用动态图像颗粒分析仪测量实验生产的石器微密度(N=5299)和考古土壤样本(N=73313)中的每个颗粒。我们已经开发了四个基于Naïve Bayes、glmnet(广义线性回归)、随机森林和XGBoost(“极限梯度Boost[ing]”)算法的机器学习模型。超参数调整优化了每个模型。随机森林模型表现最好,灵敏度为83.5%。它只对28%或0.9%的岩屑微密度进行了错误分类。XGBoost模型的灵敏度达到67.3%,而Naïve Bayes和glmnet模型的灵敏度保持在50%以下。除glmnet模型外,透明度被证明是区分微观数据量的最关键变量。我们的方法客观化和标准化了微观数据量分析。机器学习允许研究更大的样本量。不过,算法各不相同,随机森林模型提供了迄今为止最好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine Learning–Based Identification of Lithic Microdebitage
ABSTRACT Archaeologists tend to produce slow data that is contextually rich but often difficult to generalize. An example is the analysis of lithic microdebitage, or knapping debris, that is smaller than 6.3 mm (0.25 in.). So far, scholars have relied on manual approaches that are prone to intra- and interobserver errors. In the following, we present a machine learning–based alternative together with experimental archaeology and dynamic image analysis. We use a dynamic image particle analyzer to measure each particle in experimentally produced lithic microdebitage (N = 5,299) as well as an archaeological soil sample (N = 73,313). We have developed four machine learning models based on Naïve Bayes, glmnet (generalized linear regression), random forest, and XGBoost (“Extreme Gradient Boost[ing]”) algorithms. Hyperparameter tuning optimized each model. A random forest model performed best with a sensitivity of 83.5%. It misclassified only 28 or 0.9% of lithic microdebitage. XGBoost models reached a sensitivity of 67.3%, whereas Naïve Bayes and glmnet models stayed below 50%. Except for glmnet models, transparency proved to be the most critical variable to distinguish microdebitage. Our approach objectifies and standardizes microdebitage analysis. Machine learning allows studying much larger sample sizes. Algorithms differ, though, and a random forest model offers the best performance so far.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.70
自引率
21.40%
发文量
39
期刊最新文献
Settlement Selection and Inequality in Video Games through an Anthropological Lens Regression with Archaeological Count Data A Paperless and 3D Workflow for Documenting Excavations at Insula I.14, Pompeii, Italy The Legality and Ethics of Web Scraping in Archaeology Experimental Archaeogaming
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1