Model Free Method of Screening Training Data for Adversarial Datapoints Through Local Lipschitz Quotient Analysis

IF 4.6 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2024-10-17 DOI:10.1109/LRA.2024.3483628

Emily Kamienski;Harry Asada

{"title":"Model Free Method of Screening Training Data for Adversarial Datapoints Through Local Lipschitz Quotient Analysis","authors":"Emily Kamienski;Harry Asada","doi":"10.1109/LRA.2024.3483628","DOIUrl":null,"url":null,"abstract":"It is often challenging to pick suitable data features for learning problems. Sometimes certain regions of the data are harder to learn because they are not well characterized by the selected data features. The challenge is amplified when resources for sensing and computation are limited and time-critical, yet reliable decisions must be made. For example, a robotic system for preventing falls of elderly people needs a real-time fall predictor, with low false positive and false negative rates, using a simple wearable sensor to activate a fall prevention mechanism. Here we present a methodology for assessing the learnability of data based on the Lipschitz quotient. We develop a procedure for determining which regions of the dataset contain adversarial data points, input data that look similar but belong to different target classes. Regardless of the learning model, it will be hard to learn such data. We then present a method for determining which additional feature(s) are most effective in improving the predictability of each of these regions. This is a model-independent data analysis that can be executed before constructing a prediction model through machine learning or other techniques. We demonstrate this method on two synthetic datasets and a dataset of human falls, which uses inertial measurement unit signals. For the fall dataset, we identified two groups of adversarial data points and improved the predictability of each group over the baseline dataset, as assessed by Lipschitz, by using 2 different sets of features. This work offers a valuable tool for assessing data learnability that can be applied to not only fall prediction problems, but also other robotics applications that learn from data.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11122-11129"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10721399/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

It is often challenging to pick suitable data features for learning problems. Sometimes certain regions of the data are harder to learn because they are not well characterized by the selected data features. The challenge is amplified when resources for sensing and computation are limited and time-critical, yet reliable decisions must be made. For example, a robotic system for preventing falls of elderly people needs a real-time fall predictor, with low false positive and false negative rates, using a simple wearable sensor to activate a fall prevention mechanism. Here we present a methodology for assessing the learnability of data based on the Lipschitz quotient. We develop a procedure for determining which regions of the dataset contain adversarial data points, input data that look similar but belong to different target classes. Regardless of the learning model, it will be hard to learn such data. We then present a method for determining which additional feature(s) are most effective in improving the predictability of each of these regions. This is a model-independent data analysis that can be executed before constructing a prediction model through machine learning or other techniques. We demonstrate this method on two synthetic datasets and a dataset of human falls, which uses inertial measurement unit signals. For the fall dataset, we identified two groups of adversarial data points and improved the predictability of each group over the baseline dataset, as assessed by Lipschitz, by using 2 different sets of features. This work offers a valuable tool for assessing data learnability that can be applied to not only fall prediction problems, but also other robotics applications that learn from data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过局部 Lipschitz Quotient 分析筛选对抗数据点训练数据的无模型方法

为学习问题挑选合适的数据特征往往具有挑战性。有时，数据的某些区域更难学习，因为所选的数据特征不能很好地描述这些区域。如果用于感知和计算的资源有限，而且必须做出时间紧迫但又可靠的决策，那么挑战就更大了。例如，用于防止老年人跌倒的机器人系统需要一个实时跌倒预测器，该预测器具有较低的假阳性和假阴性率，可使用简单的可穿戴传感器来激活跌倒预防机制。在此，我们介绍一种基于 Lipschitz 商数评估数据可学性的方法。我们开发了一种程序，用于确定数据集中哪些区域包含对抗数据点，即看起来相似但属于不同目标类别的输入数据。无论采用哪种学习模型，都很难学习到这类数据。然后，我们将介绍一种方法，用于确定哪些附加特征能最有效地提高这些区域的可预测性。这是一种独立于模型的数据分析，可以在通过机器学习或其他技术构建预测模型之前执行。我们在两个合成数据集和一个使用惯性测量单元信号的人体跌倒数据集上演示了这种方法。对于跌倒数据集，我们确定了两组对抗性数据点，并通过使用两组不同的特征，提高了每组数据集相对于基线数据集的可预测性（由 Lipschitz 评估）。这项工作为评估数据可学性提供了一个有价值的工具，不仅可用于跌倒预测问题，还可用于其他从数据中学习的机器人应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.