Evaluation of the departmental inter-rater reliability when scoring thyroid nodules according to the British Thyroid Association Ultrasound-classification model: Is there significant disagreement?

IF 0.8 Q4 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Ultrasound Pub Date : 2023-12-28 DOI:10.1177/1742271x231215500

Nabil Rtam

{"title":"Evaluation of the departmental inter-rater reliability when scoring thyroid nodules according to the British Thyroid Association Ultrasound-classification model: Is there significant disagreement?","authors":"Nabil Rtam","doi":"10.1177/1742271x231215500","DOIUrl":null,"url":null,"abstract":"The British Thyroid Association Ultrasound-classification is a risk stratification model which grades thyroid nodules in U2–5 based on their sonographic appearance. Existence of variability between the ultrasound operators when U-scoring is reported in the literature with some evidence found in the author’s department. The aim of this study was to investigate whether there is significant disagreement in the department and identify potential reasons for variability. Eight operators, radiologists and sonographers, were recruited to grade 33 TNs and answer a tick box questionnaire using the British Thyroid Association lexicon. The inter-operator variability for the U-categories, indication for fine-needle aspiration biopsy and ultrasound features was assessed using Fleiss’ kappa and Gwet-AC1. The operators’ accuracy was measured against the most experienced operator in the department using Cohen’s kappa and percentage agreement. Fair agreement (Fleiss’ K = 0.21) was obtained between the participants when U-scoring (U2–5). Fair-to-moderate agreement was noted between sonographers ( K = 0.40). Significant variability was demonstrated between radiologists ( p > 0.05). Indication for fine-needle aspiration biopsy reached fair to almost substantial agreement (radiologists’ AC1 = 0.34, sonographers’ AC1 = 0.58, overall AC1 = 0.41). No significant variability measured for echogenicity ( K = 0.29), composition ( K = 0.33), shape ( K = 0.58), margin ( K = 0.45), halo ( K = 0.34) and vascularity ( K = 0.44). Accuracy reached fair agreement (mean Cohen’s K = 0.29) and moderate agreement (mean AC1 = 0.53) for the U-categories and fine-needle aspiration biopsy, respectively. Radiologists demonstrated lower accuracy. No significant inter-rater variability in U-scoring or recommending fine-needle aspiration biopsy was demonstrated between all the operators in the department. Radiologists showed significant variability in U-scoring and lower accuracy. Reliability and accuracy could be improved by addressing those problematic categories and features identified with this study.","PeriodicalId":23440,"journal":{"name":"Ultrasound","volume":"12 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ultrasound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/1742271x231215500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

The British Thyroid Association Ultrasound-classification is a risk stratification model which grades thyroid nodules in U2–5 based on their sonographic appearance. Existence of variability between the ultrasound operators when U-scoring is reported in the literature with some evidence found in the author’s department. The aim of this study was to investigate whether there is significant disagreement in the department and identify potential reasons for variability. Eight operators, radiologists and sonographers, were recruited to grade 33 TNs and answer a tick box questionnaire using the British Thyroid Association lexicon. The inter-operator variability for the U-categories, indication for fine-needle aspiration biopsy and ultrasound features was assessed using Fleiss’ kappa and Gwet-AC1. The operators’ accuracy was measured against the most experienced operator in the department using Cohen’s kappa and percentage agreement. Fair agreement (Fleiss’ K = 0.21) was obtained between the participants when U-scoring (U2–5). Fair-to-moderate agreement was noted between sonographers ( K = 0.40). Significant variability was demonstrated between radiologists ( p > 0.05). Indication for fine-needle aspiration biopsy reached fair to almost substantial agreement (radiologists’ AC1 = 0.34, sonographers’ AC1 = 0.58, overall AC1 = 0.41). No significant variability measured for echogenicity ( K = 0.29), composition ( K = 0.33), shape ( K = 0.58), margin ( K = 0.45), halo ( K = 0.34) and vascularity ( K = 0.44). Accuracy reached fair agreement (mean Cohen’s K = 0.29) and moderate agreement (mean AC1 = 0.53) for the U-categories and fine-needle aspiration biopsy, respectively. Radiologists demonstrated lower accuracy. No significant inter-rater variability in U-scoring or recommending fine-needle aspiration biopsy was demonstrated between all the operators in the department. Radiologists showed significant variability in U-scoring and lower accuracy. Reliability and accuracy could be improved by addressing those problematic categories and features identified with this study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

根据英国甲状腺协会超声分级模型对甲状腺结节进行评分时的部门评分者间可靠性评估：是否存在重大分歧？

英国甲状腺协会超声分级是一种风险分层模型，根据甲状腺结节的声像图外观将其分为U2-5级。有文献报道，超声操作人员在进行 U 级评分时存在差异，作者所在科室也发现了一些证据。本研究的目的是调查该科室是否存在明显的分歧，并找出产生差异的潜在原因。研究人员招募了八名操作员，包括放射科医生和超声技师，对 33 例 TN 进行评分，并使用英国甲状腺协会词典回答打勾问卷。使用 Fleiss' kappa 和 Gwet-AC1 评估了操作员之间在 U 分类、细针穿刺活检指征和超声特征方面的差异性。使用 Cohen's kappa 和一致性百分比来衡量操作员与科室内最有经验的操作员之间的准确性。在进行 U 评分（U2-5）时，参与者之间的一致性尚可（Fleiss' K = 0.21）。超声技师之间的一致性为中等偏上（K = 0.40）。放射医师之间存在显著差异（P > 0.05）。细针穿刺活检的适应症达到相当到基本一致（放射医师的 AC1 = 0.34，超声技师的 AC1 = 0.58，总体 AC1 = 0.41）。在回声(K = 0.29)、成分(K = 0.33)、形状(K = 0.58)、边缘(K = 0.45)、光晕(K = 0.34)和血管(K = 0.44)方面均无明显差异。U类和细针穿刺活检的准确性分别达到了相当一致（平均Cohen's K = 0.29）和中等一致（平均AC1 = 0.53）。放射医师的准确性较低。该科室所有操作人员之间在 U 评分或建议进行细针穿刺活检方面没有明显的评分者间差异。放射科医生在 U 值评分方面存在明显差异，准确性较低。通过解决本研究发现的问题类别和特征，可以提高可靠性和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ultrasound RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

1.70

自引率

0.00%

发文量

期刊介绍： Ultrasound is the official journal of the British Medical Ultrasound Society (BMUS), a multidisciplinary, charitable society comprising radiologists, obstetricians, sonographers, physicists and veterinarians amongst others.