{"title":"From bias to balance: Leverage representation learning for bias-free MoCap solving","authors":"Georgios Albanis , Nikolaos Zioulis , Spyridon Thermos , Anargyros Chatzitofis , Kostas Kolomvatsos","doi":"10.1016/j.cviu.2024.104241","DOIUrl":null,"url":null,"abstract":"<div><div>Motion Capture (MoCap) is still dominated by optical MoCap as it remains the gold standard. However, the raw captured data even from such systems suffer from high-frequency noise and errors sourced from ghost or occluded markers. To that end, a post-processing step is often required to clean up the data, which is typically a tedious and time-consuming process. Some studies tried to address these issues in a data-driven manner, leveraging the availability of MoCap data. However, there is a high-level data redundancy in such data, as the motion cycle is usually comprised of similar poses (e.g. standing still). Such redundancies affect the performance of those methods, especially in the rarer poses. In this work, we address the issue of long-tailed data distribution by leveraging representation learning. We introduce a novel technique for imbalanced regression that does not require additional data or labels. Our approach uses a Mahalanobis distance-based method for automatically identifying rare samples and properly reweighting them during training, while at the same time, we employ high-order interpolation algorithms to effectively sample the latent space of a Variational Autoencoder (VAE) to generate new tail samples. We prove that the proposed approach can significantly improve the results, especially in the tail samples, while at the same time is a model-agnostic method and can be applied across various architectures.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104241"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224003229","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Motion Capture (MoCap) is still dominated by optical MoCap as it remains the gold standard. However, the raw captured data even from such systems suffer from high-frequency noise and errors sourced from ghost or occluded markers. To that end, a post-processing step is often required to clean up the data, which is typically a tedious and time-consuming process. Some studies tried to address these issues in a data-driven manner, leveraging the availability of MoCap data. However, there is a high-level data redundancy in such data, as the motion cycle is usually comprised of similar poses (e.g. standing still). Such redundancies affect the performance of those methods, especially in the rarer poses. In this work, we address the issue of long-tailed data distribution by leveraging representation learning. We introduce a novel technique for imbalanced regression that does not require additional data or labels. Our approach uses a Mahalanobis distance-based method for automatically identifying rare samples and properly reweighting them during training, while at the same time, we employ high-order interpolation algorithms to effectively sample the latent space of a Variational Autoencoder (VAE) to generate new tail samples. We prove that the proposed approach can significantly improve the results, especially in the tail samples, while at the same time is a model-agnostic method and can be applied across various architectures.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems