Inter-rater reliability of stress signatures in exfoliated primary dentition - Improving scientific rigor and reproducibility in histological data collection.
Simone A M Lemmers, Mona Le Luyer, Samantha J Stoll, Alison G Hoffnagle, Rebecca J Ferrell, Julia A Gamble, Debbie Guatelli-Steinberg, Kaita N Gurian, Kate McGrath, Mackie C O'Hara, Andrew D A C Smith, Erin C Dunn
{"title":"Inter-rater reliability of stress signatures in exfoliated primary dentition - Improving scientific rigor and reproducibility in histological data collection.","authors":"Simone A M Lemmers, Mona Le Luyer, Samantha J Stoll, Alison G Hoffnagle, Rebecca J Ferrell, Julia A Gamble, Debbie Guatelli-Steinberg, Kaita N Gurian, Kate McGrath, Mackie C O'Hara, Andrew D A C Smith, Erin C Dunn","doi":"10.1371/journal.pone.0318700","DOIUrl":null,"url":null,"abstract":"<p><p>Accentuated Lines (ALs) in tooth enamel can reflect metabolic disruptions from physiological or psychological stresses during development. They can therefore serve as a retrospective biomarker of generalized stress exposure in archaeological and clinical research. However, little consensus exists on when ALs are identified and inter-rater reliability is poorly quantified across studies. Here, we sought to address this gap by examining the reliability of accentuated (AL) markings across raters, in terms of both the presence versus absence of ALs and their intensity (HAL= Highly Accentuated, MAL= Mildly Accentuated, RL= Retzius Line). Ratings were made and compared across observers (with different levels of experience) and pairs of raters (who agreed on AL coding through consensus meetings) (N = 15 teeth, eight observers). Results indicated that more experience in AL assessment does not necessarily produce higher reliability between raters. Most disagreements in intensity ratings occurred in categories other than HAL. Furthermore, when AL assessment was performed by pairs of raters, reliability was significantly higher than individual assessments (Gwet's AC1 = 0.28 to 0.56 for line presence assessment; Gwet's AC1 = 0.48 to 0.64 for line intensity assessment). Based on these results, we recommend a workflow called IRRISS (Improving Reliability and Reporting In Scoring of Stress-markers) to increase rigor and reproducibility in histological analysis of dental collections. The introduction of IRRISS is well-timed, given the surge in studies of teeth occurring across anthropological, epidemiological, medical, forensic, and climate research fields.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 3","pages":"e0318700"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0318700","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Accentuated Lines (ALs) in tooth enamel can reflect metabolic disruptions from physiological or psychological stresses during development. They can therefore serve as a retrospective biomarker of generalized stress exposure in archaeological and clinical research. However, little consensus exists on when ALs are identified and inter-rater reliability is poorly quantified across studies. Here, we sought to address this gap by examining the reliability of accentuated (AL) markings across raters, in terms of both the presence versus absence of ALs and their intensity (HAL= Highly Accentuated, MAL= Mildly Accentuated, RL= Retzius Line). Ratings were made and compared across observers (with different levels of experience) and pairs of raters (who agreed on AL coding through consensus meetings) (N = 15 teeth, eight observers). Results indicated that more experience in AL assessment does not necessarily produce higher reliability between raters. Most disagreements in intensity ratings occurred in categories other than HAL. Furthermore, when AL assessment was performed by pairs of raters, reliability was significantly higher than individual assessments (Gwet's AC1 = 0.28 to 0.56 for line presence assessment; Gwet's AC1 = 0.48 to 0.64 for line intensity assessment). Based on these results, we recommend a workflow called IRRISS (Improving Reliability and Reporting In Scoring of Stress-markers) to increase rigor and reproducibility in histological analysis of dental collections. The introduction of IRRISS is well-timed, given the surge in studies of teeth occurring across anthropological, epidemiological, medical, forensic, and climate research fields.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage