基于机器学习的Lithic小额借记识别

IF 1.9 2区历史学 0 ARCHAEOLOGY Advances in Archaeological Practice Pub Date : 2023-02-16 DOI:10.1017/aap.2022.35

Markus Eberl, Charreau S. Bell, Jesse Spencer-Smith, M. Raj, Amanda Sarubbi, Phyllis S. Johnson, Amy E. Rieth, Umang Chaudhry, Rebecca Estrada Aguila, Michael McBride

{"title":"基于机器学习的Lithic小额借记识别","authors":"Markus Eberl, Charreau S. Bell, Jesse Spencer-Smith, M. Raj, Amanda Sarubbi, Phyllis S. Johnson, Amy E. Rieth, Umang Chaudhry, Rebecca Estrada Aguila, Michael McBride","doi":"10.1017/aap.2022.35","DOIUrl":null,"url":null,"abstract":"ABSTRACT Archaeologists tend to produce slow data that is contextually rich but often difficult to generalize. An example is the analysis of lithic microdebitage, or knapping debris, that is smaller than 6.3 mm (0.25 in.). So far, scholars have relied on manual approaches that are prone to intra- and interobserver errors. In the following, we present a machine learning–based alternative together with experimental archaeology and dynamic image analysis. We use a dynamic image particle analyzer to measure each particle in experimentally produced lithic microdebitage (N = 5,299) as well as an archaeological soil sample (N = 73,313). We have developed four machine learning models based on Naïve Bayes, glmnet (generalized linear regression), random forest, and XGBoost (“Extreme Gradient Boost[ing]”) algorithms. Hyperparameter tuning optimized each model. A random forest model performed best with a sensitivity of 83.5%. It misclassified only 28 or 0.9% of lithic microdebitage. XGBoost models reached a sensitivity of 67.3%, whereas Naïve Bayes and glmnet models stayed below 50%. Except for glmnet models, transparency proved to be the most critical variable to distinguish microdebitage. Our approach objectifies and standardizes microdebitage analysis. Machine learning allows studying much larger sample sizes. Algorithms differ, though, and a random forest model offers the best performance so far.","PeriodicalId":7231,"journal":{"name":"Advances in Archaeological Practice","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Machine Learning–Based Identification of Lithic Microdebitage\",\"authors\":\"Markus Eberl, Charreau S. Bell, Jesse Spencer-Smith, M. Raj, Amanda Sarubbi, Phyllis S. Johnson, Amy E. Rieth, Umang Chaudhry, Rebecca Estrada Aguila, Michael McBride\",\"doi\":\"10.1017/aap.2022.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Archaeologists tend to produce slow data that is contextually rich but often difficult to generalize. An example is the analysis of lithic microdebitage, or knapping debris, that is smaller than 6.3 mm (0.25 in.). So far, scholars have relied on manual approaches that are prone to intra- and interobserver errors. In the following, we present a machine learning–based alternative together with experimental archaeology and dynamic image analysis. We use a dynamic image particle analyzer to measure each particle in experimentally produced lithic microdebitage (N = 5,299) as well as an archaeological soil sample (N = 73,313). We have developed four machine learning models based on Naïve Bayes, glmnet (generalized linear regression), random forest, and XGBoost (“Extreme Gradient Boost[ing]”) algorithms. Hyperparameter tuning optimized each model. A random forest model performed best with a sensitivity of 83.5%. It misclassified only 28 or 0.9% of lithic microdebitage. XGBoost models reached a sensitivity of 67.3%, whereas Naïve Bayes and glmnet models stayed below 50%. Except for glmnet models, transparency proved to be the most critical variable to distinguish microdebitage. Our approach objectifies and standardizes microdebitage analysis. Machine learning allows studying much larger sample sizes. Algorithms differ, though, and a random forest model offers the best performance so far.\",\"PeriodicalId\":7231,\"journal\":{\"name\":\"Advances in Archaeological Practice\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Archaeological Practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/aap.2022.35\",\"RegionNum\":2,\"RegionCategory\":\"历史学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"ARCHAEOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Archaeological Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/aap.2022.35","RegionNum":2,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ARCHAEOLOGY","Score":null,"Total":0}

引用次数: 1

摘要

摘要考古学家倾向于产生缓慢的数据，这些数据具有丰富的背景，但往往难以概括。一个例子是对小于6.3毫米（0.25英寸）的岩屑微挖或凿碎碎片的分析。到目前为止，学者们一直依赖于容易出现观察者内部和观察者之间错误的手动方法。在下文中，我们提出了一种基于机器学习的替代方案，以及实验考古学和动态图像分析。我们使用动态图像颗粒分析仪测量实验生产的石器微密度（N=5299）和考古土壤样本（N=73313）中的每个颗粒。我们已经开发了四个基于Naïve Bayes、glmnet（广义线性回归）、随机森林和XGBoost（“极限梯度Boost[ing]”）算法的机器学习模型。超参数调整优化了每个模型。随机森林模型表现最好，灵敏度为83.5%。它只对28%或0.9%的岩屑微密度进行了错误分类。XGBoost模型的灵敏度达到67.3%，而Naïve Bayes和glmnet模型的灵敏度保持在50%以下。除glmnet模型外，透明度被证明是区分微观数据量的最关键变量。我们的方法客观化和标准化了微观数据量分析。机器学习允许研究更大的样本量。不过，算法各不相同，随机森林模型提供了迄今为止最好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Learning–Based Identification of Lithic Microdebitage

ABSTRACT Archaeologists tend to produce slow data that is contextually rich but often difficult to generalize. An example is the analysis of lithic microdebitage, or knapping debris, that is smaller than 6.3 mm (0.25 in.). So far, scholars have relied on manual approaches that are prone to intra- and interobserver errors. In the following, we present a machine learning–based alternative together with experimental archaeology and dynamic image analysis. We use a dynamic image particle analyzer to measure each particle in experimentally produced lithic microdebitage (N = 5,299) as well as an archaeological soil sample (N = 73,313). We have developed four machine learning models based on Naïve Bayes, glmnet (generalized linear regression), random forest, and XGBoost (“Extreme Gradient Boost[ing]”) algorithms. Hyperparameter tuning optimized each model. A random forest model performed best with a sensitivity of 83.5%. It misclassified only 28 or 0.9% of lithic microdebitage. XGBoost models reached a sensitivity of 67.3%, whereas Naïve Bayes and glmnet models stayed below 50%. Except for glmnet models, transparency proved to be the most critical variable to distinguish microdebitage. Our approach objectifies and standardizes microdebitage analysis. Machine learning allows studying much larger sample sizes. Algorithms differ, though, and a random forest model offers the best performance so far.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助