Using synthetic dataset for semantic segmentation of the human body in the problem of extracting anthropometric data

Frontiers in Artificial Intelligence Pub Date : 2024-08-09 DOI:10.3389/frai.2024.1336320

Azat Absadyk, Olzhas Turar, Darkhan Akhmed-Zaki

{"title":"Using synthetic dataset for semantic segmentation of the human body in the problem of extracting anthropometric data","authors":"Azat Absadyk, Olzhas Turar, Darkhan Akhmed-Zaki","doi":"10.3389/frai.2024.1336320","DOIUrl":null,"url":null,"abstract":"The COVID-19 pandemic highlighted the need for accurate virtual sizing in e-commerce to reduce returns and waste. Existing methods for extracting anthropometric data from images have limitations. This study aims to develop a semantic segmentation model trained on synthetic data that can accurately determine body shape from real images, accounting for clothing.A synthetic dataset of over 22,000 images was created using NVIDIA Omniverse Replicator, featuring human models in various poses, clothing, and environments. Popular CNN architectures (U-Net, SegNet, DeepLabV3, PSPNet) with different backbones were trained on this dataset for semantic segmentation. Models were evaluated on accuracy, precision, recall, and IoU metrics. The best performing model was tested on real human subjects and compared to actual measurements.U-Net with EfficientNet backbone showed the best performance, with 99.83% training accuracy and 0.977 IoU score. When tested on real images, it accurately segmented body shape while accounting for clothing. Comparison with actual measurements on 9 subjects showed average deviations of −0.24 cm for neck, −0.1 cm for shoulder, 1.15 cm for chest, −0.22 cm for thallium, and 0.17 cm for hip measurements.The synthetic dataset and trained models enable accurate extraction of anthropometric data from real images while accounting for clothing. This approach has significant potential for improving virtual fitting and reducing returns in e-commerce. Future work will focus on refining the algorithm, particularly for thallium and hip measurements which showed higher variability.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"75 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1336320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The COVID-19 pandemic highlighted the need for accurate virtual sizing in e-commerce to reduce returns and waste. Existing methods for extracting anthropometric data from images have limitations. This study aims to develop a semantic segmentation model trained on synthetic data that can accurately determine body shape from real images, accounting for clothing.A synthetic dataset of over 22,000 images was created using NVIDIA Omniverse Replicator, featuring human models in various poses, clothing, and environments. Popular CNN architectures (U-Net, SegNet, DeepLabV3, PSPNet) with different backbones were trained on this dataset for semantic segmentation. Models were evaluated on accuracy, precision, recall, and IoU metrics. The best performing model was tested on real human subjects and compared to actual measurements.U-Net with EfficientNet backbone showed the best performance, with 99.83% training accuracy and 0.977 IoU score. When tested on real images, it accurately segmented body shape while accounting for clothing. Comparison with actual measurements on 9 subjects showed average deviations of −0.24 cm for neck, −0.1 cm for shoulder, 1.15 cm for chest, −0.22 cm for thallium, and 0.17 cm for hip measurements.The synthetic dataset and trained models enable accurate extraction of anthropometric data from real images while accounting for clothing. This approach has significant potential for improving virtual fitting and reducing returns in e-commerce. Future work will focus on refining the algorithm, particularly for thallium and hip measurements which showed higher variability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在提取人体测量数据问题中使用合成数据集进行人体语义分割

COVID-19 大流行凸显了电子商务中精确虚拟尺寸的必要性，以减少退货和浪费。从图像中提取人体测量数据的现有方法存在局限性。本研究旨在开发一种在合成数据上训练的语义分割模型，该模型可以从真实图像中准确确定人体形状，并考虑服装因素。我们使用英伟达 Omniverse Replicator 创建了一个包含 22,000 多张图像的合成数据集，其中有各种姿势、服装和环境下的人体模型。采用不同骨干的流行 CNN 架构（U-Net、SegNet、DeepLabV3、PSPNet）在该数据集上进行了语义分割训练。根据准确度、精确度、召回率和 IoU 指标对模型进行了评估。采用 EfficientNet 主干网的 U-Net 表现最佳，训练准确率为 99.83%，IoU 得分为 0.977。在真实图像上进行测试时，它能准确地分割人体形状，同时考虑服装因素。与 9 名受试者的实际测量结果比较显示，颈部测量的平均偏差为-0.24 厘米，肩部测量的平均偏差为-0.1 厘米，胸部测量的平均偏差为 1.15 厘米，髀部测量的平均偏差为-0.22 厘米，臀部测量的平均偏差为 0.17 厘米。合成数据集和训练有素的模型能够从真实图像中准确提取人体测量数据，同时考虑服装因素。这种方法在改进虚拟试衣和减少电子商务退货方面具有巨大潜力。未来的工作重点是改进算法，尤其是在铊和臀部测量方面，因为这两个方面显示出更高的可变性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in Artificial Intelligence

自引率

0.00%

发文量