Inter-rater reliability of stress signatures in exfoliated primary dentition - Improving scientific rigor and reproducibility in histological data collection.

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES PLoS ONE Pub Date : 2025-03-19 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0318700

Simone A M Lemmers, Mona Le Luyer, Samantha J Stoll, Alison G Hoffnagle, Rebecca J Ferrell, Julia A Gamble, Debbie Guatelli-Steinberg, Kaita N Gurian, Kate McGrath, Mackie C O'Hara, Andrew D A C Smith, Erin C Dunn

{"title":"Inter-rater reliability of stress signatures in exfoliated primary dentition - Improving scientific rigor and reproducibility in histological data collection.","authors":"Simone A M Lemmers, Mona Le Luyer, Samantha J Stoll, Alison G Hoffnagle, Rebecca J Ferrell, Julia A Gamble, Debbie Guatelli-Steinberg, Kaita N Gurian, Kate McGrath, Mackie C O'Hara, Andrew D A C Smith, Erin C Dunn","doi":"10.1371/journal.pone.0318700","DOIUrl":null,"url":null,"abstract":"<p><p>Accentuated Lines (ALs) in tooth enamel can reflect metabolic disruptions from physiological or psychological stresses during development. They can therefore serve as a retrospective biomarker of generalized stress exposure in archaeological and clinical research. However, little consensus exists on when ALs are identified and inter-rater reliability is poorly quantified across studies. Here, we sought to address this gap by examining the reliability of accentuated (AL) markings across raters, in terms of both the presence versus absence of ALs and their intensity (HAL= Highly Accentuated, MAL= Mildly Accentuated, RL= Retzius Line). Ratings were made and compared across observers (with different levels of experience) and pairs of raters (who agreed on AL coding through consensus meetings) (N = 15 teeth, eight observers). Results indicated that more experience in AL assessment does not necessarily produce higher reliability between raters. Most disagreements in intensity ratings occurred in categories other than HAL. Furthermore, when AL assessment was performed by pairs of raters, reliability was significantly higher than individual assessments (Gwet's AC1 = 0.28 to 0.56 for line presence assessment; Gwet's AC1 = 0.48 to 0.64 for line intensity assessment). Based on these results, we recommend a workflow called IRRISS (Improving Reliability and Reporting In Scoring of Stress-markers) to increase rigor and reproducibility in histological analysis of dental collections. The introduction of IRRISS is well-timed, given the surge in studies of teeth occurring across anthropological, epidemiological, medical, forensic, and climate research fields.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 3","pages":"e0318700"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922276/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0318700","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Accentuated Lines (ALs) in tooth enamel can reflect metabolic disruptions from physiological or psychological stresses during development. They can therefore serve as a retrospective biomarker of generalized stress exposure in archaeological and clinical research. However, little consensus exists on when ALs are identified and inter-rater reliability is poorly quantified across studies. Here, we sought to address this gap by examining the reliability of accentuated (AL) markings across raters, in terms of both the presence versus absence of ALs and their intensity (HAL= Highly Accentuated, MAL= Mildly Accentuated, RL= Retzius Line). Ratings were made and compared across observers (with different levels of experience) and pairs of raters (who agreed on AL coding through consensus meetings) (N = 15 teeth, eight observers). Results indicated that more experience in AL assessment does not necessarily produce higher reliability between raters. Most disagreements in intensity ratings occurred in categories other than HAL. Furthermore, when AL assessment was performed by pairs of raters, reliability was significantly higher than individual assessments (Gwet's AC1 = 0.28 to 0.56 for line presence assessment; Gwet's AC1 = 0.48 to 0.64 for line intensity assessment). Based on these results, we recommend a workflow called IRRISS (Improving Reliability and Reporting In Scoring of Stress-markers) to increase rigor and reproducibility in histological analysis of dental collections. The introduction of IRRISS is well-timed, given the surge in studies of teeth occurring across anthropological, epidemiological, medical, forensic, and climate research fields.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

脱落的初级牙列应力特征的内部可靠性-提高组织学数据收集的科学严谨性和可重复性。

牙釉质的强化纹（ALs）可以反映发育过程中生理或心理压力引起的代谢中断。因此，它们可以作为考古和临床研究中广义应激暴露的回顾性生物标志物。然而，关于何时识别ALs存在很少的共识，并且在研究中对评分者之间的可靠性进行了较差的量化。在这里，我们试图通过检查评分者之间强化（AL）标记的可靠性来解决这一差距，包括ALs的存在与不存在及其强度（HAL=高度强化，MAL=轻度强化，RL= Retzius线）。在观察者（不同经验水平）和成对的评分者（通过共识会议同意人工智能编码）之间进行评分和比较（N = 15个牙齿，8个观察者）。结果表明，更多的人工智能评估经验并不一定会使评分者之间的可靠性更高。大多数对强度分级的分歧发生在HAL以外的类别。此外，当对评分者进行人工智能评估时，信度显著高于个体评估（Gwet的AC1 = 0.28 ~ 0.56）；Gwet的AC1 = 0.48至0.64线强度评估)。基于这些结果，我们推荐一种称为IRRISS（提高压力标记评分的可靠性和报告）的工作流程，以提高牙齿收集的组织学分析的严谨性和可重复性。鉴于人类学、流行病学、医学、法医和气候研究领域对牙齿的研究激增，IRRISS的引入恰逢其时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage