{"title":"On the Limitations of Compute Thresholds as a Governance Strategy","authors":"Sara Hooker","doi":"arxiv-2407.05694","DOIUrl":null,"url":null,"abstract":"At face value, this essay is about understanding a fairly esoteric governance\ntool called compute thresholds. However, in order to grapple with whether these\nthresholds will achieve anything, we must first understand how they came to be.\nThis requires engaging with a decades-old debate at the heart of computer\nscience progress, namely, is bigger always better? Hence, this essay may be of\ninterest not only to policymakers and the wider public but also to computer\nscientists interested in understanding the role of compute in unlocking\nbreakthroughs. Does a certain inflection point of compute result in changes to\nthe risk profile of a model? This discussion is increasingly urgent given the\nwide adoption of governance approaches that suggest greater compute equates\nwith higher propensity for harm. Several leading frontier AI companies have\nreleased responsible scaling policies. Both the White House Executive Orders on\nAI Safety (EO) and the EU AI Act encode the use of FLOP or floating-point\noperations as a way to identify more powerful systems. What is striking about\nthe choice of compute thresholds to-date is that no models currently deployed\nin the wild fulfill the current criteria set by the EO. This implies that the\nemphasis is often not on auditing the risks and harms incurred by currently\ndeployed models - but rather is based upon the belief that future levels of\ncompute will introduce unforeseen new risks. A key conclusion of this essay is\nthat compute thresholds as currently implemented are shortsighted and likely to\nfail to mitigate risk. Governance that is overly reliant on compute fails to\nunderstand that the relationship between compute and risk is highly uncertain\nand rapidly changing. It also overestimates our ability to predict what\nabilities emerge at different scales. This essay ends with recommendations for\na better way forward.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"2016 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.05694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
At face value, this essay is about understanding a fairly esoteric governance
tool called compute thresholds. However, in order to grapple with whether these
thresholds will achieve anything, we must first understand how they came to be.
This requires engaging with a decades-old debate at the heart of computer
science progress, namely, is bigger always better? Hence, this essay may be of
interest not only to policymakers and the wider public but also to computer
scientists interested in understanding the role of compute in unlocking
breakthroughs. Does a certain inflection point of compute result in changes to
the risk profile of a model? This discussion is increasingly urgent given the
wide adoption of governance approaches that suggest greater compute equates
with higher propensity for harm. Several leading frontier AI companies have
released responsible scaling policies. Both the White House Executive Orders on
AI Safety (EO) and the EU AI Act encode the use of FLOP or floating-point
operations as a way to identify more powerful systems. What is striking about
the choice of compute thresholds to-date is that no models currently deployed
in the wild fulfill the current criteria set by the EO. This implies that the
emphasis is often not on auditing the risks and harms incurred by currently
deployed models - but rather is based upon the belief that future levels of
compute will introduce unforeseen new risks. A key conclusion of this essay is
that compute thresholds as currently implemented are shortsighted and likely to
fail to mitigate risk. Governance that is overly reliant on compute fails to
understand that the relationship between compute and risk is highly uncertain
and rapidly changing. It also overestimates our ability to predict what
abilities emerge at different scales. This essay ends with recommendations for
a better way forward.