arXiv: Learning最新文献

英文中文

High dimensional robust M-estimation : arbitrary corruption and heavy tails 高维鲁棒m估计:任意损坏和重尾

arXiv: Learning

Pub Date : 2021-07-06 DOI: 10.26153/TSW/15001

Liu Liu

We consider the problem of sparsity-constrained $M$-estimation when both explanatory and response variables have heavy tails (bounded 4-th moments), or a fraction of arbitrary corruptions. We focus on the $k$-sparse, high-dimensional regime where the number of variables $d$ and the sample size $n$ are related through $n sim k log d$. We define a natural condition we call the Robust Descent Condition (RDC), and show that if a gradient estimator satisfies the RDC, then Robust Hard Thresholding (IHT using this gradient estimator), is guaranteed to obtain good statistical rates. The contribution of this paper is in showing that this RDC is a flexible enough concept to recover known results, and obtain new robustness results. Specifically, new results include: (a) For $k$-sparse high-dimensional linear- and logistic-regression with heavy tail (bounded 4-th moment) explanatory and response variables, a linear-time-computable median-of-means gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal; (b) When instead of heavy tails we have $O(1/sqrt{k}log(nd))$-fraction of arbitrary corruptions in explanatory and response variables, a near linear-time computable trimmed gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal. We demonstrate the effectiveness of our approach in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data.

当解释变量和响应变量都具有重尾(有界的第4阶矩)或任意损坏的一小部分时，我们考虑稀疏约束$M$估计问题。我们专注于$k$ -稀疏，高维状态，其中变量数量$d$和样本量$n$通过$n sim k log d$相关。我们定义了一个自然条件，我们称之为鲁棒下降条件(RDC)，并表明，如果一个梯度估计满足RDC，那么鲁棒硬阈值(IHT)使用这个梯度估计，保证获得良好的统计率。本文的贡献在于表明RDC是一个足够灵活的概念，可以恢复已知结果，并获得新的鲁棒性结果。具体来说，新的结果包括:(a)对于$k$ -稀疏高维线性和逻辑回归，具有重尾(有界的第4矩)解释变量和响应变量，线性时间可计算的中位数梯度估计满足RDC，因此鲁棒硬阈值是最小最大最优的;(b)当我们在解释和响应变量中有$O(1/sqrt{k}log(nd))$ -任意损坏的分数时，一个近线性时间可计算的裁剪梯度估计器满足RDC，因此鲁棒硬阈值是最小最大最优的。我们证明了我们的方法在稀疏线性、逻辑回归和稀疏精度矩阵估计上对合成和真实美国股票数据的有效性。

{"title":"High dimensional robust M-estimation : arbitrary corruption and heavy tails","authors":"Liu Liu","doi":"10.26153/TSW/15001","DOIUrl":"https://doi.org/10.26153/TSW/15001","url":null,"abstract":"We consider the problem of sparsity-constrained $M$-estimation when both explanatory and response variables have heavy tails (bounded 4-th moments), or a fraction of arbitrary corruptions. We focus on the $k$-sparse, high-dimensional regime where the number of variables $d$ and the sample size $n$ are related through $n sim k log d$. We define a natural condition we call the Robust Descent Condition (RDC), and show that if a gradient estimator satisfies the RDC, then Robust Hard Thresholding (IHT using this gradient estimator), is guaranteed to obtain good statistical rates. The contribution of this paper is in showing that this RDC is a flexible enough concept to recover known results, and obtain new robustness results. Specifically, new results include: (a) For $k$-sparse high-dimensional linear- and logistic-regression with heavy tail (bounded 4-th moment) explanatory and response variables, a linear-time-computable median-of-means gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal; (b) When instead of heavy tails we have $O(1/sqrt{k}log(nd))$-fraction of arbitrary corruptions in explanatory and response variables, a near linear-time computable trimmed gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal. We demonstrate the effectiveness of our approach in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84760806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Boosting share routing for multi-task learning. 促进多任务学习的共享路由。

arXiv: Learning

Pub Date : 2020-09-01 DOI: 10.1145/3442442.3452323

Chen Xiaokai, Gu Xiaoguang, Fu Libo

Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance. How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL. Most existing deep MTL models are based on parameter sharing. However, suitable sharing mechanism is hard to design as the relationship among tasks is complicated. In this paper, we propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem. MTNAS modularizes the sharing part into multiple layers of sub-networks. It allows sparse connection among these sub-networks and soft sharing based on gating is enabled for a certain route. Benefiting from such setting, each candidate architecture in our search space defines a dynamic sparse sharing route which is more flexible compared with full-sharing in previous approaches. We show that existing typical sharing approaches are sub-graphs in our search space. Extensive experiments on three real-world recommendation datasets demonstrate MTANS achieves consistent improvement compared with single-task models and typical multi-task methods while maintaining high computation efficiency. Furthermore, in-depth experiments demonstrates that MTNAS can learn suitable sparse route to mitigate negative transfer.

多任务学习(MTL)旨在充分利用多任务监督信号中包含的知识来提高整体绩效。如何使多任务的知识适当地共享是MTL的一个开放性问题。现有的深度MTL模型大多是基于参数共享的。但由于任务间关系复杂，难以设计合适的共享机制。在本文中，我们提出了一个通用的框架，称为多任务神经结构搜索(MTNAS)，以有效地为给定的MTL问题找到合适的共享路由。MTNAS将共享部分模块化为多层子网。它允许这些子网之间的稀疏连接，并对某条路由启用基于门控的软共享。得益于这样的设置，我们的搜索空间中的每个候选体系结构都定义了一个动态的稀疏共享路由，与之前的完全共享方法相比，该路由更加灵活。我们表明，现有的典型共享方法是搜索空间中的子图。在三个真实推荐数据集上的大量实验表明，与单任务模型和典型的多任务方法相比，MTANS在保持较高计算效率的同时取得了一致的改进。此外，深入的实验表明，MTNAS可以学习合适的稀疏路径来减轻负迁移。

{"title":"Boosting share routing for multi-task learning.","authors":"Chen Xiaokai, Gu Xiaoguang, Fu Libo","doi":"10.1145/3442442.3452323","DOIUrl":"https://doi.org/10.1145/3442442.3452323","url":null,"abstract":"Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance. How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL. Most existing deep MTL models are based on parameter sharing. However, suitable sharing mechanism is hard to design as the relationship among tasks is complicated. In this paper, we propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem. MTNAS modularizes the sharing part into multiple layers of sub-networks. It allows sparse connection among these sub-networks and soft sharing based on gating is enabled for a certain route. Benefiting from such setting, each candidate architecture in our search space defines a dynamic sparse sharing route which is more flexible compared with full-sharing in previous approaches. We show that existing typical sharing approaches are sub-graphs in our search space. Extensive experiments on three real-world recommendation datasets demonstrate MTANS achieves consistent improvement compared with single-task models and typical multi-task methods while maintaining high computation efficiency. Furthermore, in-depth experiments demonstrates that MTNAS can learn suitable sparse route to mitigate negative transfer.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"102 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80508074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Clustering Residential Electricity Consumption Data to Create Archetypes that Capture Household Behaviour in South Africa 聚集住宅用电数据，创建捕捉南非家庭行为的原型

arXiv: Learning

Pub Date : 2020-06-11 DOI: 10.18489/sacj.v32i2.845

Wiebke Toussaint, Deshendran Moodley

Clustering is frequently used in the energy domain to identify dominant electricity consumption patterns of households, which can be used to construct customer archetypes for long term energy planning. Selecting a useful set of clusters however requires extensive experimentation and domain knowledge. While internal clustering validation measures are well established in the electricity domain, they are limited for selecting useful clusters. Based on an application case study in South Africa, we present an approach for formalising implicit expert knowledge as external evaluation measures to create customer archetypes that capture variability in residential electricity consumption behaviour. By combining internal and external validation measures in a structured manner, we were able to evaluate clustering structures based on the utility they present for our application. We validate the selected clusters in a use case where we successfully reconstruct customer archetypes previously developed by experts. Our approach shows promise for transparent and repeatable cluster ranking and selection by data scientists, even if they have limited domain knowledge.

聚类经常用于能源领域，以确定家庭的主要电力消费模式，该模式可用于构建长期能源规划的客户原型。然而，选择一组有用的集群需要大量的实验和领域知识。虽然内部聚类验证措施在电力领域建立得很好，但它们在选择有用的聚类方面受到限制。基于南非的一个应用案例研究，我们提出了一种将隐性专家知识形式化的方法，作为外部评估措施，以创建捕捉住宅用电行为可变性的客户原型。通过以结构化的方式组合内部和外部验证措施，我们能够根据它们为我们的应用程序提供的实用程序来评估集群结构。我们在一个用例中验证选定的集群，在这个用例中，我们成功地重构了以前由专家开发的客户原型。我们的方法为数据科学家提供了透明和可重复的聚类排序和选择，即使他们的领域知识有限。

{"title":"Clustering Residential Electricity Consumption Data to Create Archetypes that Capture Household Behaviour in South Africa","authors":"Wiebke Toussaint, Deshendran Moodley","doi":"10.18489/sacj.v32i2.845","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.845","url":null,"abstract":"Clustering is frequently used in the energy domain to identify dominant electricity consumption patterns of households, which can be used to construct customer archetypes for long term energy planning. Selecting a useful set of clusters however requires extensive experimentation and domain knowledge. While internal clustering validation measures are well established in the electricity domain, they are limited for selecting useful clusters. Based on an application case study in South Africa, we present an approach for formalising implicit expert knowledge as external evaluation measures to create customer archetypes that capture variability in residential electricity consumption behaviour. By combining internal and external validation measures in a structured manner, we were able to evaluate clustering structures based on the utility they present for our application. We validate the selected clusters in a use case where we successfully reconstruct customer archetypes previously developed by experts. Our approach shows promise for transparent and repeatable cluster ranking and selection by data scientists, even if they have limited domain knowledge.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89120501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins? 使用gan的综合观察性健康数据:从缓慢采用到医学研究的繁荣，最终成为数字双胞胎?

arXiv: Learning

Pub Date : 2020-05-27 DOI: 10.22541/au.158921777.79483839/v2

Jeremy Georges-Filteau, Elisa Cirillo

After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.

观察性健康数据(OHD)被收集用于患者护理后，可以通过维持健康信息学和医学研究的发展，进一步造福患者福祉。由于与患者有关的数据具有极强的私密性，而且有保护这些数据的法规，因此巨大的潜力尚未得到开发。生成对抗网络(GANs)最近作为一种突破性的方式出现，用于学习生成模型，产生真实的合成数据。它们在自动驾驶汽车、欺诈检测、工业部门的数字孪生模拟和医疗成像等多个领域带来了革命性的实践。数字孪生概念可以很容易地应用于疾病进展的建模和量化。此外，gan还具有许多与医疗保健中的常见问题相关的功能:缺乏数据、类别不平衡、罕见疾病和保护隐私。开放访问保护隐私的OHD可能会对科学研究产生革命性的影响。在2019冠状病毒病期间，医疗保健系统面临着前所未有的挑战，其中许多挑战与上述原因有关。考虑到这些事实，有关GAN应用于OHD的出版物似乎严重缺乏。为了揭示这种缓慢采用的原因，我们广泛地回顾了有关该主题的已发表文献。我们的研究结果表明，OHD的特性最初对现有的GAN算法具有挑战性(不像医学成像，最先进的模型可以直接转移)，评估合成数据缺乏明确的指标。我们发现关于这一主题的出版物比预期的要多，从2017年开始缓慢增加，从那以后增加的速度越来越快。OHD的困难仍然存在，我们讨论了与评估、一致性、基准、数据建模和可重复性有关的问题。

{"title":"Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?","authors":"Jeremy Georges-Filteau, Elisa Cirillo","doi":"10.22541/au.158921777.79483839/v2","DOIUrl":"https://doi.org/10.22541/au.158921777.79483839/v2","url":null,"abstract":"\u0000 After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73022017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A Review of Privacy-Preserving Federated Learning for the Internet-of-Things 面向物联网的隐私保护联邦学习研究综述

arXiv: Learning

Pub Date : 2020-04-24 DOI: 10.1007/978-3-030-70604-3_2

Christopher Briggs, Zhong Fan, Péter András

引用次数: 22

Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs 基于时序知识图的图Hawkes神经网络预测

arXiv: Learning

Pub Date : 2020-02-14 DOI: 10.24432/C50018

Zhen Han, Yunpu Ma, Yuyi Wang, Stephan Günnemann, Volker Tresp

The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes process to a neurally self-modulating multivariate point process, which enables the capturing of more complex and realistic impacts of past events on future events. However, this approach is limited by the number of possible event types, making it impossible to model the dynamics of evolving graph sequences, where each possible link between two nodes can be considered as an event type. The number of event types increases even further when links are directional and labeled. To address this issue, we propose the Graph Hawkes Neural Network that can capture the dynamics of evolving graph sequences and can predict the occurrence of a fact in a future time instance. Extensive experiments on large-scale temporal multi-relational databases, such as temporal knowledge graphs, demonstrate the effectiveness of our approach.

Hawkes过程已经成为建模具有不同事件类型的自激事件序列的标准方法。最近的一项工作将Hawkes过程推广到一个神经自调节的多元点过程，这使得捕获过去事件对未来事件的更复杂和现实的影响成为可能。然而，这种方法受到可能的事件类型数量的限制，使得不可能对不断发展的图序列的动态建模，其中两个节点之间的每个可能链接都可以被视为事件类型。当链接具有方向性并带有标签时，事件类型的数量甚至会进一步增加。为了解决这个问题，我们提出了图霍克斯神经网络，它可以捕捉不断发展的图序列的动态，并可以预测未来时间实例中事实的发生。在大规模时态多关系数据库(如时态知识图)上的大量实验证明了我们的方法的有效性。

引用次数: 46

Quasi-Equivalence of Width and Depth of Neural Networks 神经网络宽度和深度的拟等价

arXiv: Learning

Pub Date : 2020-02-06 DOI: 10.21203/rs.3.rs-92324/v1

Fenglei Fan, Rongjie Lai, Ge Wang

While classic studies proved that wide networks allow universal approximation, recent research and successesof deep learning demonstrate the power of the network depth. Based on a symmetric consideration,we investigate if the design of artificial neural networks should have a directional preference, and what themechanism of interaction is between the width and depth of a network. We address this fundamental questionby establishing a quasi-equivalence between the width and depth of ReLU networks. Specifically, we formulate atransformation from an arbitrary ReLU network to a wide network and a deep network for either regressionor classification so that an essentially same capability of the original network can be implemented. That is, adeep regression/classification ReLU network has a wide equivalent, and vice versa, subject to an arbitrarily smallerror. Interestingly, the quasi-equivalence between wide and deep classification ReLU networks is a data-drivenversion of the DeMorgan law.

虽然经典研究证明了宽网络允许通用近似，但最近的研究和深度学习的成功证明了网络深度的力量。基于对称考虑，我们研究了人工神经网络的设计是否应该具有方向偏好，以及网络宽度和深度之间的相互作用机制。我们通过建立ReLU网络的宽度和深度之间的拟等价来解决这个基本问题。具体来说，我们制定了从任意ReLU网络到广泛网络和深度网络的转换，用于回归或分类，从而可以实现与原始网络基本相同的功能。也就是说，深度回归/分类ReLU网络具有广泛的等效性，反之亦然，受到任意小误差的影响。有趣的是，宽分类和深分类ReLU网络之间的准等价是民主党法律的数据驱动版本。

引用次数: 11

Topology Optimization Using Convolutional Neural Network 基于卷积神经网络的拓扑优化

arXiv: Learning

Pub Date : 2020-01-01 DOI: 10.1007/978-981-15-5432-2_26

B. Harish, Kandula Eswara Sai Kumar, B. Srinivasan

引用次数: 33

Quantile Convolutional Neural Networks for Value at Risk Forecasting 分位卷积神经网络在风险价值预测中的应用

arXiv: Learning

Pub Date : 2019-08-21 DOI: 10.1016/J.MLWA.2021.100096

G'abor Petneh'azi

引用次数: 6

Trainability of ReLU networks and Data-dependent Initialization. ReLU网络的可训练性与数据依赖初始化。

arXiv: Learning

Pub Date : 2019-07-23 DOI: 10.1615/.2020034126

Yeonjong Shin, G. Karniadakis

In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.

本文研究了整流线性单元(ReLU)网络的可训练性问题。如果一个ReLU神经元对任何输入只输出一个常数，那么它就被认为是死的。介绍了神经元的两种死亡状态;暂时和永久的死亡。如果永久死亡神经元的数量对于学习任务来说足够小，那么我们就说这个网络是可训练的。我们把网络可训练的概率称为可训练性。我们证明了网络的可训练性是训练成功的必要条件，可训练性是训练成功率的上界。为了量化可训练性，我们研究了初始化时活动神经元数量的概率分布。在许多应用中，过度指定或过度参数化的神经网络被成功地应用并被证明是有效的训练。利用可训练性的概念，我们证明了过度参数化是最小化训练损失的充分必要条件。此外，我们提出了一种基于数据的过参数化初始化方法。数值算例验证了该方法的有效性和理论结论。

{"title":"Trainability of ReLU networks and Data-dependent Initialization.","authors":"Yeonjong Shin, G. Karniadakis","doi":"10.1615/.2020034126","DOIUrl":"https://doi.org/10.1615/.2020034126","url":null,"abstract":"In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91538087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv: Learning

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀