Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones
{"title":"The Ungrounded Alignment Problem","authors":"Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones","doi":"arxiv-2408.04242","DOIUrl":null,"url":null,"abstract":"Modern machine learning systems have demonstrated substantial abilities with\nmethods that either embrace or ignore human-provided knowledge, but combining\nbenefits of both styles remains a challenge. One particular challenge involves\ndesigning learning systems that exhibit built-in responses to specific abstract\nstimulus patterns, yet are still plastic enough to be agnostic about the\nmodality and exact form of their inputs. In this paper, we investigate what we\ncall The Ungrounded Alignment Problem, which asks How can we build in\npredefined knowledge in a system where we don't know how a given stimulus will\nbe grounded? This paper examines a simplified version of the general problem,\nwhere an unsupervised learner is presented with a sequence of images for the\ncharacters in a text corpus, and this learner is later evaluated on its ability\nto recognize specific (possibly rare) sequential patterns. Importantly, the\nlearner is given no labels during learning or evaluation, but must map images\nfrom an unknown font or permutation to its correct class label. That is, at no\npoint is our learner given labeled images, where an image vector is explicitly\nassociated with a class label. Despite ample work in unsupervised and\nself-supervised loss functions, all current methods require a labeled\nfine-tuning phase to map the learned representations to correct classes.\nFinding this mapping in the absence of labels may seem a fool's errand, but our\nmain result resolves this seeming paradox. We show that leveraging only letter\nbigram frequencies is sufficient for an unsupervised learner both to reliably\nassociate images to class labels and to reliably identify trigger words in the\nsequence of inputs. More generally, this method suggests an approach for\nencoding specific desired innate behaviour in modality-agnostic models.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"111 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern machine learning systems have demonstrated substantial abilities with
methods that either embrace or ignore human-provided knowledge, but combining
benefits of both styles remains a challenge. One particular challenge involves
designing learning systems that exhibit built-in responses to specific abstract
stimulus patterns, yet are still plastic enough to be agnostic about the
modality and exact form of their inputs. In this paper, we investigate what we
call The Ungrounded Alignment Problem, which asks How can we build in
predefined knowledge in a system where we don't know how a given stimulus will
be grounded? This paper examines a simplified version of the general problem,
where an unsupervised learner is presented with a sequence of images for the
characters in a text corpus, and this learner is later evaluated on its ability
to recognize specific (possibly rare) sequential patterns. Importantly, the
learner is given no labels during learning or evaluation, but must map images
from an unknown font or permutation to its correct class label. That is, at no
point is our learner given labeled images, where an image vector is explicitly
associated with a class label. Despite ample work in unsupervised and
self-supervised loss functions, all current methods require a labeled
fine-tuning phase to map the learned representations to correct classes.
Finding this mapping in the absence of labels may seem a fool's errand, but our
main result resolves this seeming paradox. We show that leveraging only letter
bigram frequencies is sufficient for an unsupervised learner both to reliably
associate images to class labels and to reliably identify trigger words in the
sequence of inputs. More generally, this method suggests an approach for
encoding specific desired innate behaviour in modality-agnostic models.