{"title":"关于使用不可靠的确定性别和年龄特征的集合在模型训练中通过标准颅测量程序的特征来确定性别","authors":"I.G. Shirobokov","doi":"10.20874/2071-0437-2023-62-3-11","DOIUrl":null,"url":null,"abstract":"The study is concerned with the feasibility of applying machine-learning methods to determine the sex from craniometric features when working with materials from archaeological excavations. A specific feature of such materials is subjectively estimated sex and age characteristics of individuals. The main object of the analysis was a sample measured by V.P. Alekseev and comprised of 258 crania (137 male and 121 female) characterising Russian population of the European part of Russia in the 17th–18th cc. As a test sample, a group of crania of the Russians with documented sex and age, registered within several collections of the Kunstkamera’s repository, also measured by V.P. Alekseev, was used. The series includes 89 male and 10 female skulls, which came to the museum from the Military Medical Academy in 1911–1914 by the effort of the Russian anatomist K.Z. Yatsuta. The models were trained, validated, and tested using four different methods, including discriminant analysis, lo-gistic regression, random forest, and support vector machine. Thirty-three craniometric traits were included in the analysis, from which a group of five features with the highest differentiating ability (Nos. by Martin) — 1, 40, 43, 45, 75(1) — was chosen. When both sets of traits were used for the models commensurable performance indica-tors were obtained. According to the results of the cross-validation, in 85–88 % of cases, on average, all four models accurately predicted the sex estimates given by V.P. Alekseev. When the models were applied to the test sample, the proportion of accurate classifications did not change and stood at 87–88 %. At the same time, the machine-learning methods did not reveal any noticeable advantages in the level of the classification accuracy over the linear discriminant analysis. In general, the efficiency of the obtained models corresponds to the average value of the indicators calculated from the materials of 80 publications (86 %). It is likely that the crania, whose sex cannot be correctly classified neither by the models nor by visual assessment, constitute overlapping sets, which have some common morphological features assimilating them to individuals of the opposite sex. Applica-tion of the models to the skulls of the test sample, re-measured by the author, revealed some deterioration of the model performance indicators in all four cases. The decrease in the proportion of accurate classifications is caused mainly by discrepancies in the estimation of the nasal protrusion angle, as well as subjective errors in the size estimation under insufficient preservation of the crania and partial atrophy of the alveolar process.","PeriodicalId":36692,"journal":{"name":"Vestnik Archeologii, Antropologii i Etnografii","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the use of collections with unreliably determined sex and age characteristics in model train-ing for sex determination by traits of the standard craniometric program\",\"authors\":\"I.G. Shirobokov\",\"doi\":\"10.20874/2071-0437-2023-62-3-11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study is concerned with the feasibility of applying machine-learning methods to determine the sex from craniometric features when working with materials from archaeological excavations. A specific feature of such materials is subjectively estimated sex and age characteristics of individuals. The main object of the analysis was a sample measured by V.P. Alekseev and comprised of 258 crania (137 male and 121 female) characterising Russian population of the European part of Russia in the 17th–18th cc. As a test sample, a group of crania of the Russians with documented sex and age, registered within several collections of the Kunstkamera’s repository, also measured by V.P. Alekseev, was used. The series includes 89 male and 10 female skulls, which came to the museum from the Military Medical Academy in 1911–1914 by the effort of the Russian anatomist K.Z. Yatsuta. The models were trained, validated, and tested using four different methods, including discriminant analysis, lo-gistic regression, random forest, and support vector machine. Thirty-three craniometric traits were included in the analysis, from which a group of five features with the highest differentiating ability (Nos. by Martin) — 1, 40, 43, 45, 75(1) — was chosen. When both sets of traits were used for the models commensurable performance indica-tors were obtained. According to the results of the cross-validation, in 85–88 % of cases, on average, all four models accurately predicted the sex estimates given by V.P. Alekseev. When the models were applied to the test sample, the proportion of accurate classifications did not change and stood at 87–88 %. At the same time, the machine-learning methods did not reveal any noticeable advantages in the level of the classification accuracy over the linear discriminant analysis. In general, the efficiency of the obtained models corresponds to the average value of the indicators calculated from the materials of 80 publications (86 %). It is likely that the crania, whose sex cannot be correctly classified neither by the models nor by visual assessment, constitute overlapping sets, which have some common morphological features assimilating them to individuals of the opposite sex. Applica-tion of the models to the skulls of the test sample, re-measured by the author, revealed some deterioration of the model performance indicators in all four cases. The decrease in the proportion of accurate classifications is caused mainly by discrepancies in the estimation of the nasal protrusion angle, as well as subjective errors in the size estimation under insufficient preservation of the crania and partial atrophy of the alveolar process.\",\"PeriodicalId\":36692,\"journal\":{\"name\":\"Vestnik Archeologii, Antropologii i Etnografii\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vestnik Archeologii, Antropologii i Etnografii\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20874/2071-0437-2023-62-3-11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vestnik Archeologii, Antropologii i Etnografii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20874/2071-0437-2023-62-3-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0
摘要
该研究关注的是在处理考古发掘的材料时,应用机器学习方法从颅骨特征确定性别的可行性。这类材料的一个具体特征是主观估计个人的性别和年龄特征。分析的主要对象是由V.P. Alekseev测量的样本,该样本由258个颅骨(137个男性和121个女性)组成,代表了17 - 18世纪俄罗斯欧洲部分的俄罗斯人口特征。作为测试样本,使用了一组有性别和年龄记录的俄罗斯人颅骨,这些颅骨登记在Kunstkamera仓库的几个收藏中,也是由V.P. Alekseev测量的。该系列包括89个男性头骨和10个女性头骨,这些头骨是在俄罗斯解剖学家K.Z. Yatsuta的努力下,于1911年至1914年从军事医学院(Military Medical Academy)送到博物馆的。使用判别分析、logistic回归、随机森林和支持向量机四种不同的方法对模型进行训练、验证和测试。在分析中包括33个颅特征,从中选择了一组具有最高区分能力的五个特征(no . by Martin) - 1,40,43,45,75(1)。当这两组特征同时用于模型时,得到了可通约的性能指标。根据交叉验证的结果,平均而言,在85 - 88%的情况下,所有四个模型都准确地预测了V.P.阿列克谢耶夫给出的性别估计。当模型应用于测试样本时,准确分类的比例没有变化,保持在87 - 88%。同时,与线性判别分析相比,机器学习方法在分类精度水平上没有显示出任何明显的优势。总的来说,所得模型的效率与从80份出版物的材料中计算出的指标的平均值相对应(86%)。这很可能是由于颅骨的性别既不能通过模型也不能通过视觉评估来正确分类,它们构成了重叠的集合,这些集合具有一些共同的形态学特征,使它们与异性个体同化。将模型应用于测试样本的头骨,作者重新测量,发现在所有四种情况下模型性能指标都有所恶化。准确分类比例的下降主要是由于对鼻突角的估计存在差异,以及在颅骨保存不足、牙槽突部分萎缩的情况下,对大小的估计存在主观误差。
On the use of collections with unreliably determined sex and age characteristics in model train-ing for sex determination by traits of the standard craniometric program
The study is concerned with the feasibility of applying machine-learning methods to determine the sex from craniometric features when working with materials from archaeological excavations. A specific feature of such materials is subjectively estimated sex and age characteristics of individuals. The main object of the analysis was a sample measured by V.P. Alekseev and comprised of 258 crania (137 male and 121 female) characterising Russian population of the European part of Russia in the 17th–18th cc. As a test sample, a group of crania of the Russians with documented sex and age, registered within several collections of the Kunstkamera’s repository, also measured by V.P. Alekseev, was used. The series includes 89 male and 10 female skulls, which came to the museum from the Military Medical Academy in 1911–1914 by the effort of the Russian anatomist K.Z. Yatsuta. The models were trained, validated, and tested using four different methods, including discriminant analysis, lo-gistic regression, random forest, and support vector machine. Thirty-three craniometric traits were included in the analysis, from which a group of five features with the highest differentiating ability (Nos. by Martin) — 1, 40, 43, 45, 75(1) — was chosen. When both sets of traits were used for the models commensurable performance indica-tors were obtained. According to the results of the cross-validation, in 85–88 % of cases, on average, all four models accurately predicted the sex estimates given by V.P. Alekseev. When the models were applied to the test sample, the proportion of accurate classifications did not change and stood at 87–88 %. At the same time, the machine-learning methods did not reveal any noticeable advantages in the level of the classification accuracy over the linear discriminant analysis. In general, the efficiency of the obtained models corresponds to the average value of the indicators calculated from the materials of 80 publications (86 %). It is likely that the crania, whose sex cannot be correctly classified neither by the models nor by visual assessment, constitute overlapping sets, which have some common morphological features assimilating them to individuals of the opposite sex. Applica-tion of the models to the skulls of the test sample, re-measured by the author, revealed some deterioration of the model performance indicators in all four cases. The decrease in the proportion of accurate classifications is caused mainly by discrepancies in the estimation of the nasal protrusion angle, as well as subjective errors in the size estimation under insufficient preservation of the crania and partial atrophy of the alveolar process.