{"title":"用网络内计算训练类chatgpt模型","authors":"Shuhao Fu, Yong Liao, Pengyuan Zhou","doi":"10.1145/3600061.3603136","DOIUrl":null,"url":null,"abstract":"ChatGPT shows the enormous potential of large language models (LLMs). These models can easily reach the size of billions of parameters and create training difficulties for the majority. We propose a paradigm to train LLMs using distributed in-network computation on routers. Our preliminary result shows that our design allows LLMs to be trained at a reasonable learning rate without demanding extensive GPU resources.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Training ChatGPT-like Models with In-network Computation\",\"authors\":\"Shuhao Fu, Yong Liao, Pengyuan Zhou\",\"doi\":\"10.1145/3600061.3603136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ChatGPT shows the enormous potential of large language models (LLMs). These models can easily reach the size of billions of parameters and create training difficulties for the majority. We propose a paradigm to train LLMs using distributed in-network computation on routers. Our preliminary result shows that our design allows LLMs to be trained at a reasonable learning rate without demanding extensive GPU resources.\",\"PeriodicalId\":228934,\"journal\":{\"name\":\"Proceedings of the 7th Asia-Pacific Workshop on Networking\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th Asia-Pacific Workshop on Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3600061.3603136\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th Asia-Pacific Workshop on Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3600061.3603136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Training ChatGPT-like Models with In-network Computation
ChatGPT shows the enormous potential of large language models (LLMs). These models can easily reach the size of billions of parameters and create training difficulties for the majority. We propose a paradigm to train LLMs using distributed in-network computation on routers. Our preliminary result shows that our design allows LLMs to be trained at a reasonable learning rate without demanding extensive GPU resources.