解决偏微分方程的深度生成模型:用于训练大型无数据模型的分布式计算

IF 65.3 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Foundations and Trends in Machine Learning Pub Date : 2020-07-24 DOI:10.1109/MLHPCAI4S51975.2020.00013
Sergio Botelho, Ameya Joshi, Biswajit Khara, S. Sarkar, C. Hegde, Santi S. Adavani, B. Ganapathysubramanian
{"title":"解决偏微分方程的深度生成模型:用于训练大型无数据模型的分布式计算","authors":"Sergio Botelho, Ameya Joshi, Biswajit Khara, S. Sarkar, C. Hegde, Santi S. Adavani, B. Ganapathysubramanian","doi":"10.1109/MLHPCAI4S51975.2020.00013","DOIUrl":null,"url":null,"abstract":"Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions (≥ 1024 × 1024).Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are 2–3 × faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.","PeriodicalId":47667,"journal":{"name":"Foundations and Trends in Machine Learning","volume":"12 1","pages":"50-63"},"PeriodicalIF":65.3000,"publicationDate":"2020-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models\",\"authors\":\"Sergio Botelho, Ameya Joshi, Biswajit Khara, S. Sarkar, C. Hegde, Santi S. Adavani, B. Ganapathysubramanian\",\"doi\":\"10.1109/MLHPCAI4S51975.2020.00013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions (≥ 1024 × 1024).Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are 2–3 × faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.\",\"PeriodicalId\":47667,\"journal\":{\"name\":\"Foundations and Trends in Machine Learning\",\"volume\":\"12 1\",\"pages\":\"50-63\"},\"PeriodicalIF\":65.3000,\"publicationDate\":\"2020-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Foundations and Trends in Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLHPCAI4S51975.2020.00013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLHPCAI4S51975.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 7

摘要

科学机器学习(SciML)的最新进展为训练求解复杂偏微分方程(PDEs)的新型神经网络架构提供了可能性。最近报道了几种(几乎没有数据的)方法成功地解决了偏微分方程,其中包括深度前馈网络、生成网络和深度编码器-解码器网络。然而,这些方法的实际采用受到训练这些模型的困难的限制,特别是在大输出分辨率(≥1024 × 1024)下进行预测。在这里,我们报告了一个用于数据并行分布式深度学习的软件框架,该框架解决了在合理的时间内训练这些大型SciML模型以及分配存储需求的双重挑战。我们的框架提供了几个开箱即用的功能,包括(a)独立于进程数量的损失完整性,(b)同步批规范化,以及(c)分布式高阶优化方法。我们展示了该框架在云和HPC集群上的出色可扩展性,并报告了带宽、网络拓扑和裸机与云之间的相互作用。我们采用这种方法来训练迄今为止不可能的大小的生成模型,表明神经PDE求解器可以在实际应用中训练。我们还证明了分布式高阶优化方法比基于随机梯度的方法快2-3倍,并且在更高的批大小下提供最小的收敛漂移。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models
Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions (≥ 1024 × 1024).Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are 2–3 × faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Foundations and Trends in Machine Learning
Foundations and Trends in Machine Learning COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
108.50
自引率
0.00%
发文量
5
期刊介绍: Each issue of Foundations and Trends® in Machine Learning comprises a monograph of at least 50 pages written by research leaders in the field. We aim to publish monographs that provide an in-depth, self-contained treatment of topics where there have been significant new developments. Typically, this means that the monographs we publish will contain a significant level of mathematical detail (to describe the central methods and/or theory for the topic at hand), and will not eschew these details by simply pointing to existing references. Literature surveys and original research papers do not fall within these aims.
期刊最新文献
Model-based Reinforcement Learning: A Survey Probabilistic Learning Reinforcement Learning Support Vector Machine Advanced Clustering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1