Jonathan Thomm, Michael Hersche, Giacomo Camposampiero, Aleksandar Terzić, Bernhard Schölkopf, Abbas Rahimi
{"title":"终止可微分树专家","authors":"Jonathan Thomm, Michael Hersche, Giacomo Camposampiero, Aleksandar Terzić, Bernhard Schölkopf, Abbas Rahimi","doi":"arxiv-2407.02060","DOIUrl":null,"url":null,"abstract":"We advance the recently proposed neuro-symbolic Differentiable Tree Machine,\nwhich learns tree operations using a combination of transformers and Tensor\nProduct Representations. We investigate the architecture and propose two key\ncomponents. We first remove a series of different transformer layers that are\nused in every step by introducing a mixture of experts. This results in a\nDifferentiable Tree Experts model with a constant number of parameters for any\narbitrary number of steps in the computation, compared to the previous method\nin the Differentiable Tree Machine with a linear growth. Given this flexibility\nin the number of steps, we additionally propose a new termination algorithm to\nprovide the model the power to choose how many steps to make automatically. The\nresulting Terminating Differentiable Tree Experts model sluggishly learns to\npredict the number of steps without an oracle. It can do so while maintaining\nthe learning capabilities of the model, converging to the optimal amount of\nsteps.","PeriodicalId":501033,"journal":{"name":"arXiv - CS - Symbolic Computation","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Terminating Differentiable Tree Experts\",\"authors\":\"Jonathan Thomm, Michael Hersche, Giacomo Camposampiero, Aleksandar Terzić, Bernhard Schölkopf, Abbas Rahimi\",\"doi\":\"arxiv-2407.02060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We advance the recently proposed neuro-symbolic Differentiable Tree Machine,\\nwhich learns tree operations using a combination of transformers and Tensor\\nProduct Representations. We investigate the architecture and propose two key\\ncomponents. We first remove a series of different transformer layers that are\\nused in every step by introducing a mixture of experts. This results in a\\nDifferentiable Tree Experts model with a constant number of parameters for any\\narbitrary number of steps in the computation, compared to the previous method\\nin the Differentiable Tree Machine with a linear growth. Given this flexibility\\nin the number of steps, we additionally propose a new termination algorithm to\\nprovide the model the power to choose how many steps to make automatically. The\\nresulting Terminating Differentiable Tree Experts model sluggishly learns to\\npredict the number of steps without an oracle. It can do so while maintaining\\nthe learning capabilities of the model, converging to the optimal amount of\\nsteps.\",\"PeriodicalId\":501033,\"journal\":{\"name\":\"arXiv - CS - Symbolic Computation\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Symbolic Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.02060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Symbolic Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.02060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We advance the recently proposed neuro-symbolic Differentiable Tree Machine,
which learns tree operations using a combination of transformers and Tensor
Product Representations. We investigate the architecture and propose two key
components. We first remove a series of different transformer layers that are
used in every step by introducing a mixture of experts. This results in a
Differentiable Tree Experts model with a constant number of parameters for any
arbitrary number of steps in the computation, compared to the previous method
in the Differentiable Tree Machine with a linear growth. Given this flexibility
in the number of steps, we additionally propose a new termination algorithm to
provide the model the power to choose how many steps to make automatically. The
resulting Terminating Differentiable Tree Experts model sluggishly learns to
predict the number of steps without an oracle. It can do so while maintaining
the learning capabilities of the model, converging to the optimal amount of
steps.