Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn
{"title":"闪电自我关注的几何学:可识别性和维度","authors":"Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn","doi":"arxiv-2408.17221","DOIUrl":null,"url":null,"abstract":"We consider function spaces defined by self-attention networks without\nnormalization, and theoretically analyze their geometry. Since these networks\nare polynomial, we rely on tools from algebraic geometry. In particular, we\nstudy the identifiability of deep attention by providing a description of the\ngeneric fibers of the parametrization for an arbitrary number of layers and, as\na consequence, compute the dimension of the function space. Additionally, for a\nsingle-layer model, we characterize the singular and boundary points. Finally,\nwe formulate a conjectural extension of our results to normalized\nself-attention networks, prove it for a single layer, and numerically verify it\nin the deep case.","PeriodicalId":501063,"journal":{"name":"arXiv - MATH - Algebraic Geometry","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Geometry of Lightning Self-Attention: Identifiability and Dimension\",\"authors\":\"Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn\",\"doi\":\"arxiv-2408.17221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider function spaces defined by self-attention networks without\\nnormalization, and theoretically analyze their geometry. Since these networks\\nare polynomial, we rely on tools from algebraic geometry. In particular, we\\nstudy the identifiability of deep attention by providing a description of the\\ngeneric fibers of the parametrization for an arbitrary number of layers and, as\\na consequence, compute the dimension of the function space. Additionally, for a\\nsingle-layer model, we characterize the singular and boundary points. Finally,\\nwe formulate a conjectural extension of our results to normalized\\nself-attention networks, prove it for a single layer, and numerically verify it\\nin the deep case.\",\"PeriodicalId\":501063,\"journal\":{\"name\":\"arXiv - MATH - Algebraic Geometry\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Algebraic Geometry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.17221\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Algebraic Geometry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.17221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Geometry of Lightning Self-Attention: Identifiability and Dimension
We consider function spaces defined by self-attention networks without
normalization, and theoretically analyze their geometry. Since these networks
are polynomial, we rely on tools from algebraic geometry. In particular, we
study the identifiability of deep attention by providing a description of the
generic fibers of the parametrization for an arbitrary number of layers and, as
a consequence, compute the dimension of the function space. Additionally, for a
single-layer model, we characterize the singular and boundary points. Finally,
we formulate a conjectural extension of our results to normalized
self-attention networks, prove it for a single layer, and numerically verify it
in the deep case.