We consider the problem of sparsity-constrained $M$-estimation when both explanatory and response variables have heavy tails (bounded 4-th moments), or a fraction of arbitrary corruptions. We focus on the $k$-sparse, high-dimensional regime where the number of variables $d$ and the sample size $n$ are related through $n sim k log d$. We define a natural condition we call the Robust Descent Condition (RDC), and show that if a gradient estimator satisfies the RDC, then Robust Hard Thresholding (IHT using this gradient estimator), is guaranteed to obtain good statistical rates. The contribution of this paper is in showing that this RDC is a flexible enough concept to recover known results, and obtain new robustness results. Specifically, new results include: (a) For $k$-sparse high-dimensional linear- and logistic-regression with heavy tail (bounded 4-th moment) explanatory and response variables, a linear-time-computable median-of-means gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal; (b) When instead of heavy tails we have $O(1/sqrt{k}log(nd))$-fraction of arbitrary corruptions in explanatory and response variables, a near linear-time computable trimmed gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal. We demonstrate the effectiveness of our approach in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data.
当解释变量和响应变量都具有重尾(有界的第4阶矩)或任意损坏的一小部分时,我们考虑稀疏约束$M$估计问题。我们专注于$k$ -稀疏,高维状态,其中变量数量$d$和样本量$n$通过$n sim k log d$相关。我们定义了一个自然条件,我们称之为鲁棒下降条件(RDC),并表明,如果一个梯度估计满足RDC,那么鲁棒硬阈值(IHT)使用这个梯度估计,保证获得良好的统计率。本文的贡献在于表明RDC是一个足够灵活的概念,可以恢复已知结果,并获得新的鲁棒性结果。具体来说,新的结果包括:(a)对于$k$ -稀疏高维线性和逻辑回归,具有重尾(有界的第4矩)解释变量和响应变量,线性时间可计算的中位数梯度估计满足RDC,因此鲁棒硬阈值是最小最大最优的;(b)当我们在解释和响应变量中有$O(1/sqrt{k}log(nd))$ -任意损坏的分数时,一个近线性时间可计算的裁剪梯度估计器满足RDC,因此鲁棒硬阈值是最小最大最优的。我们证明了我们的方法在稀疏线性、逻辑回归和稀疏精度矩阵估计上对合成和真实美国股票数据的有效性。
{"title":"High dimensional robust M-estimation : arbitrary corruption and heavy tails","authors":"Liu Liu","doi":"10.26153/TSW/15001","DOIUrl":"https://doi.org/10.26153/TSW/15001","url":null,"abstract":"We consider the problem of sparsity-constrained $M$-estimation when both explanatory and response variables have heavy tails (bounded 4-th moments), or a fraction of arbitrary corruptions. We focus on the $k$-sparse, high-dimensional regime where the number of variables $d$ and the sample size $n$ are related through $n sim k log d$. We define a natural condition we call the Robust Descent Condition (RDC), and show that if a gradient estimator satisfies the RDC, then Robust Hard Thresholding (IHT using this gradient estimator), is guaranteed to obtain good statistical rates. The contribution of this paper is in showing that this RDC is a flexible enough concept to recover known results, and obtain new robustness results. Specifically, new results include: (a) For $k$-sparse high-dimensional linear- and logistic-regression with heavy tail (bounded 4-th moment) explanatory and response variables, a linear-time-computable median-of-means gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal; (b) When instead of heavy tails we have $O(1/sqrt{k}log(nd))$-fraction of arbitrary corruptions in explanatory and response variables, a near linear-time computable trimmed gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal. We demonstrate the effectiveness of our approach in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84760806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance. How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL. Most existing deep MTL models are based on parameter sharing. However, suitable sharing mechanism is hard to design as the relationship among tasks is complicated. In this paper, we propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem. MTNAS modularizes the sharing part into multiple layers of sub-networks. It allows sparse connection among these sub-networks and soft sharing based on gating is enabled for a certain route. Benefiting from such setting, each candidate architecture in our search space defines a dynamic sparse sharing route which is more flexible compared with full-sharing in previous approaches. We show that existing typical sharing approaches are sub-graphs in our search space. Extensive experiments on three real-world recommendation datasets demonstrate MTANS achieves consistent improvement compared with single-task models and typical multi-task methods while maintaining high computation efficiency. Furthermore, in-depth experiments demonstrates that MTNAS can learn suitable sparse route to mitigate negative transfer.
{"title":"Boosting share routing for multi-task learning.","authors":"Chen Xiaokai, Gu Xiaoguang, Fu Libo","doi":"10.1145/3442442.3452323","DOIUrl":"https://doi.org/10.1145/3442442.3452323","url":null,"abstract":"Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance. How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL. Most existing deep MTL models are based on parameter sharing. However, suitable sharing mechanism is hard to design as the relationship among tasks is complicated. In this paper, we propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem. MTNAS modularizes the sharing part into multiple layers of sub-networks. It allows sparse connection among these sub-networks and soft sharing based on gating is enabled for a certain route. Benefiting from such setting, each candidate architecture in our search space defines a dynamic sparse sharing route which is more flexible compared with full-sharing in previous approaches. We show that existing typical sharing approaches are sub-graphs in our search space. Extensive experiments on three real-world recommendation datasets demonstrate MTANS achieves consistent improvement compared with single-task models and typical multi-task methods while maintaining high computation efficiency. Furthermore, in-depth experiments demonstrates that MTNAS can learn suitable sparse route to mitigate negative transfer.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"102 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80508074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clustering is frequently used in the energy domain to identify dominant electricity consumption patterns of households, which can be used to construct customer archetypes for long term energy planning. Selecting a useful set of clusters however requires extensive experimentation and domain knowledge. While internal clustering validation measures are well established in the electricity domain, they are limited for selecting useful clusters. Based on an application case study in South Africa, we present an approach for formalising implicit expert knowledge as external evaluation measures to create customer archetypes that capture variability in residential electricity consumption behaviour. By combining internal and external validation measures in a structured manner, we were able to evaluate clustering structures based on the utility they present for our application. We validate the selected clusters in a use case where we successfully reconstruct customer archetypes previously developed by experts. Our approach shows promise for transparent and repeatable cluster ranking and selection by data scientists, even if they have limited domain knowledge.
{"title":"Clustering Residential Electricity Consumption Data to Create Archetypes that Capture Household Behaviour in South Africa","authors":"Wiebke Toussaint, Deshendran Moodley","doi":"10.18489/sacj.v32i2.845","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.845","url":null,"abstract":"Clustering is frequently used in the energy domain to identify dominant electricity consumption patterns of households, which can be used to construct customer archetypes for long term energy planning. Selecting a useful set of clusters however requires extensive experimentation and domain knowledge. While internal clustering validation measures are well established in the electricity domain, they are limited for selecting useful clusters. Based on an application case study in South Africa, we present an approach for formalising implicit expert knowledge as external evaluation measures to create customer archetypes that capture variability in residential electricity consumption behaviour. By combining internal and external validation measures in a structured manner, we were able to evaluate clustering structures based on the utility they present for our application. We validate the selected clusters in a use case where we successfully reconstruct customer archetypes previously developed by experts. Our approach shows promise for transparent and repeatable cluster ranking and selection by data scientists, even if they have limited domain knowledge.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89120501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-27DOI: 10.22541/au.158921777.79483839/v2
Jeremy Georges-Filteau, Elisa Cirillo
After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.
{"title":"Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?","authors":"Jeremy Georges-Filteau, Elisa Cirillo","doi":"10.22541/au.158921777.79483839/v2","DOIUrl":"https://doi.org/10.22541/au.158921777.79483839/v2","url":null,"abstract":"\u0000 After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73022017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-24DOI: 10.1007/978-3-030-70604-3_2
Christopher Briggs, Zhong Fan, Péter András
{"title":"A Review of Privacy-Preserving Federated Learning for the Internet-of-Things","authors":"Christopher Briggs, Zhong Fan, Péter András","doi":"10.1007/978-3-030-70604-3_2","DOIUrl":"https://doi.org/10.1007/978-3-030-70604-3_2","url":null,"abstract":"","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"111 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79337536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Han, Yunpu Ma, Yuyi Wang, Stephan Günnemann, Volker Tresp
The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes process to a neurally self-modulating multivariate point process, which enables the capturing of more complex and realistic impacts of past events on future events. However, this approach is limited by the number of possible event types, making it impossible to model the dynamics of evolving graph sequences, where each possible link between two nodes can be considered as an event type. The number of event types increases even further when links are directional and labeled. To address this issue, we propose the Graph Hawkes Neural Network that can capture the dynamics of evolving graph sequences and can predict the occurrence of a fact in a future time instance. Extensive experiments on large-scale temporal multi-relational databases, such as temporal knowledge graphs, demonstrate the effectiveness of our approach.
{"title":"Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs","authors":"Zhen Han, Yunpu Ma, Yuyi Wang, Stephan Günnemann, Volker Tresp","doi":"10.24432/C50018","DOIUrl":"https://doi.org/10.24432/C50018","url":null,"abstract":"The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes process to a neurally self-modulating multivariate point process, which enables the capturing of more complex and realistic impacts of past events on future events. However, this approach is limited by the number of possible event types, making it impossible to model the dynamics of evolving graph sequences, where each possible link between two nodes can be considered as an event type. The number of event types increases even further when links are directional and labeled. To address this issue, we propose the Graph Hawkes Neural Network that can capture the dynamics of evolving graph sequences and can predict the occurrence of a fact in a future time instance. Extensive experiments on large-scale temporal multi-relational databases, such as temporal knowledge graphs, demonstrate the effectiveness of our approach.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75237454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-06DOI: 10.21203/rs.3.rs-92324/v1
Fenglei Fan, Rongjie Lai, Ge Wang
While classic studies proved that wide networks allow universal approximation, recent research and successes of deep learning demonstrate the power of the network depth. Based on a symmetric consideration, we investigate if the design of artificial neural networks should have a directional preference, and what the mechanism of interaction is between the width and depth of a network. We address this fundamental question by establishing a quasi-equivalence between the width and depth of ReLU networks. Specifically, we formulate a transformation from an arbitrary ReLU network to a wide network and a deep network for either regression or classification so that an essentially same capability of the original network can be implemented. That is, a deep regression/classification ReLU network has a wide equivalent, and vice versa, subject to an arbitrarily small error. Interestingly, the quasi-equivalence between wide and deep classification ReLU networks is a data-driven version of the DeMorgan law.
{"title":"Quasi-Equivalence of Width and Depth of Neural Networks","authors":"Fenglei Fan, Rongjie Lai, Ge Wang","doi":"10.21203/rs.3.rs-92324/v1","DOIUrl":"https://doi.org/10.21203/rs.3.rs-92324/v1","url":null,"abstract":"\u0000 While classic studies proved that wide networks allow universal approximation, recent research and successes\u0000of deep learning demonstrate the power of the network depth. Based on a symmetric consideration,\u0000we investigate if the design of artificial neural networks should have a directional preference, and what the\u0000mechanism of interaction is between the width and depth of a network. We address this fundamental question\u0000by establishing a quasi-equivalence between the width and depth of ReLU networks. Specifically, we formulate a\u0000transformation from an arbitrary ReLU network to a wide network and a deep network for either regression\u0000or classification so that an essentially same capability of the original network can be implemented. That is, a\u0000deep regression/classification ReLU network has a wide equivalent, and vice versa, subject to an arbitrarily small\u0000error. Interestingly, the quasi-equivalence between wide and deep classification ReLU networks is a data-driven\u0000version of the DeMorgan law.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"137 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86280945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-21DOI: 10.1016/J.MLWA.2021.100096
G'abor Petneh'azi
{"title":"Quantile Convolutional Neural Networks for Value at Risk Forecasting","authors":"G'abor Petneh'azi","doi":"10.1016/J.MLWA.2021.100096","DOIUrl":"https://doi.org/10.1016/J.MLWA.2021.100096","url":null,"abstract":"","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81497931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.
{"title":"Trainability of ReLU networks and Data-dependent Initialization.","authors":"Yeonjong Shin, G. Karniadakis","doi":"10.1615/.2020034126","DOIUrl":"https://doi.org/10.1615/.2020034126","url":null,"abstract":"In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91538087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}