S. Keadle, Julian Martinez, S. Strath, J. Sirard, D. John, S. Intille, Diego Arguello, Marcos Amalbert-Birriel, Rachel Barnett, B. Thapa-Chhetry, Melanna Cox, John Chase, Erin E. Dooley, Robert Marcotte, Alex Tolas, John W. Staudemayer
{"title":"Evaluation of Within- and Between-Site Agreement for Direct Observation of Physical Behavior Across Four Research Groups","authors":"S. Keadle, Julian Martinez, S. Strath, J. Sirard, D. John, S. Intille, Diego Arguello, Marcos Amalbert-Birriel, Rachel Barnett, B. Thapa-Chhetry, Melanna Cox, John Chase, Erin E. Dooley, Robert Marcotte, Alex Tolas, John W. Staudemayer","doi":"10.1123/jmpb.2022-0048","DOIUrl":null,"url":null,"abstract":"Direct observation (DO) is a widely accepted ground-truth measure, but the field lacks standard operational definitions. Research groups develop project-specific annotation platforms, limiting the utility of DO if labels are not consistent. Purpose: The purpose was to evaluate within- and between-site agreement for DO taxonomies (e.g., activity intensity category) across four independent research groups who have used video-recorded DO. Methods: Each site contributed video files (508 min) and had two trained research assistants annotate the shared video files according to their existing annotation protocols. The authors calculated (a) within-site agreement for the two coders at the same site expressed as intraclass correlation and (b) between-site agreement, the proportion of seconds that agree between any two coders regardless of site. Results: Within-site agreement at all sites was good–excellent for both activity intensity categories (intraclass correlation range: .82–.9) and posture/whole-body movement (intraclass correlation range: .77–.98). Between-site agreement for intensity categories was 94.6% for sedentary, 80.9% for light, and 82.8% for moderate–vigorous. Three of the four sites had common labels for eight posture/whole-body movements and had within-site agreements of 94.5% and between-site agreements of 86.1%. Conclusions: Distinct research groups can annotate key features of physical behavior with good-to-excellent interrater reliability. Operational definitions are provided for core metrics for researchers to consider in future studies to facilitate between-study comparisons and data pooling, enabling the deployment of deep learning approaches to wearable device algorithm calibration.","PeriodicalId":73572,"journal":{"name":"Journal for the measurement of physical behaviour","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal for the measurement of physical behaviour","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1123/jmpb.2022-0048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Direct observation (DO) is a widely accepted ground-truth measure, but the field lacks standard operational definitions. Research groups develop project-specific annotation platforms, limiting the utility of DO if labels are not consistent. Purpose: The purpose was to evaluate within- and between-site agreement for DO taxonomies (e.g., activity intensity category) across four independent research groups who have used video-recorded DO. Methods: Each site contributed video files (508 min) and had two trained research assistants annotate the shared video files according to their existing annotation protocols. The authors calculated (a) within-site agreement for the two coders at the same site expressed as intraclass correlation and (b) between-site agreement, the proportion of seconds that agree between any two coders regardless of site. Results: Within-site agreement at all sites was good–excellent for both activity intensity categories (intraclass correlation range: .82–.9) and posture/whole-body movement (intraclass correlation range: .77–.98). Between-site agreement for intensity categories was 94.6% for sedentary, 80.9% for light, and 82.8% for moderate–vigorous. Three of the four sites had common labels for eight posture/whole-body movements and had within-site agreements of 94.5% and between-site agreements of 86.1%. Conclusions: Distinct research groups can annotate key features of physical behavior with good-to-excellent interrater reliability. Operational definitions are provided for core metrics for researchers to consider in future studies to facilitate between-study comparisons and data pooling, enabling the deployment of deep learning approaches to wearable device algorithm calibration.