{"title":"Bone age assessment by multi-granularity and multi-attention feature encoding.","authors":"Bowen Liu, Yulin Huang, Shaowei Li, Jinshui He, Dongxu Zhang","doi":"10.21037/qims-23-806","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bone age assessment (BAA) is crucial for the diagnosis of growth disorders and the optimization of treatments. However, the random error caused by different observers' experiences and the low consistency of repeated assessments harms the quality of such assessments. Thus, automated assessment methods are needed.</p><p><strong>Methods: </strong>Previous research has sought to design localization modules in a strongly or weakly supervised fashion to aggregate part regions to better recognize subtle differences. Conversely, we sought to efficiently deliver information between multi-granularity regions for fine-grained feature learning and to directly model long-distance relationships for global understanding. The proposed method has been named the \"Multi-Granularity and Multi-Attention Net (2M-Net)\". Specifically, we first applied the jigsaw method to generate related tasks emphasizing regions with different granularities, and we then trained the model on these tasks using a hierarchical sharing mechanism. In effect, the training signals from the extra tasks created as an inductive bias, enabling 2M-Net to discover task relatedness without the need for annotations. Next, the self-attention mechanism acted as a plug-and-play module to effectively enhance the feature representation capabilities. Finally, multi-scale features were applied for prediction.</p><p><strong>Results: </strong>A public data set of 14,236 hand radiographs, provided by the Radiological Society of North America (RSNA), was used to develop and validate 2M-Net. In the public benchmark testing, the mean absolute error (MAE) between the bone age estimates of the model and of the reviewer was 3.98 months (3.89 months for males and 4.07 months for females).</p><p><strong>Conclusions: </strong>By using the jigsaw method to construct a multi-task learning strategy and inserting the self-attention module for efficient global modeling, we established 2M-Net, which is comparable to the previous best method in terms of performance.</p>","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11320534/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-23-806","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/19 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Bone age assessment (BAA) is crucial for the diagnosis of growth disorders and the optimization of treatments. However, the random error caused by different observers' experiences and the low consistency of repeated assessments harms the quality of such assessments. Thus, automated assessment methods are needed.
Methods: Previous research has sought to design localization modules in a strongly or weakly supervised fashion to aggregate part regions to better recognize subtle differences. Conversely, we sought to efficiently deliver information between multi-granularity regions for fine-grained feature learning and to directly model long-distance relationships for global understanding. The proposed method has been named the "Multi-Granularity and Multi-Attention Net (2M-Net)". Specifically, we first applied the jigsaw method to generate related tasks emphasizing regions with different granularities, and we then trained the model on these tasks using a hierarchical sharing mechanism. In effect, the training signals from the extra tasks created as an inductive bias, enabling 2M-Net to discover task relatedness without the need for annotations. Next, the self-attention mechanism acted as a plug-and-play module to effectively enhance the feature representation capabilities. Finally, multi-scale features were applied for prediction.
Results: A public data set of 14,236 hand radiographs, provided by the Radiological Society of North America (RSNA), was used to develop and validate 2M-Net. In the public benchmark testing, the mean absolute error (MAE) between the bone age estimates of the model and of the reviewer was 3.98 months (3.89 months for males and 4.07 months for females).
Conclusions: By using the jigsaw method to construct a multi-task learning strategy and inserting the self-attention module for efficient global modeling, we established 2M-Net, which is comparable to the previous best method in terms of performance.