Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919699
T. Banerjee, Prudhvi K. Gurram, Gene T. Whipps
Periodic statistical behavior of data is observed in many practical problems encountered in cyber-physical systems and biology. A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to model such data. An optimal stopping theory is developed to solve sequential detection problems for i.p.i.d. processes. The developed theory is then applied to detect a change in the distribution of an i.p.i.d. process. It is shown that the optimal change detection algorithm is a stopping rule based on a periodic sequence of thresholds. Numerical results are provided to demonstrate that a single-threshold policy is not strictly optimal.
{"title":"A Sequential Detection Theory for Statistically Periodic Random Processes","authors":"T. Banerjee, Prudhvi K. Gurram, Gene T. Whipps","doi":"10.1109/ALLERTON.2019.8919699","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919699","url":null,"abstract":"Periodic statistical behavior of data is observed in many practical problems encountered in cyber-physical systems and biology. A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to model such data. An optimal stopping theory is developed to solve sequential detection problems for i.p.i.d. processes. The developed theory is then applied to detect a change in the distribution of an i.p.i.d. process. It is shown that the optimal change detection algorithm is a stopping rule based on a periodic sequence of thresholds. Numerical results are provided to demonstrate that a single-threshold policy is not strictly optimal.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127835150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919935
V. Kavitha, M. Maheshwari, E. Altman
We consider an example of stochastic games with partial, asymmetric and non-classical information. We obtain relevant equilibrium policies using a new approach which allows managing the belief updates in a structured manner. Agents have access only to partial information updates, and our approach is to consider optimal open loop control until the information update. The agents continuously control the rates of their Poisson search clocks to acquire the locks, the agent to get all the locks before others would get reward one. However, the agents have no information about the acquisition status of others and will incur a cost proportional to their rate process. We solved the problem for the case with two agents and two locks and conjectured the results for N-agents. We showed that a pair of (1partial) state-dependent time-threshold policies form a Nash equilibrium.
{"title":"Acquisition Games with Partial-Asymmetric Information","authors":"V. Kavitha, M. Maheshwari, E. Altman","doi":"10.1109/ALLERTON.2019.8919935","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919935","url":null,"abstract":"We consider an example of stochastic games with partial, asymmetric and non-classical information. We obtain relevant equilibrium policies using a new approach which allows managing the belief updates in a structured manner. Agents have access only to partial information updates, and our approach is to consider optimal open loop control until the information update. The agents continuously control the rates of their Poisson search clocks to acquire the locks, the agent to get all the locks before others would get reward one. However, the agents have no information about the acquisition status of others and will incur a cost proportional to their rate process. We solved the problem for the case with two agents and two locks and conjectured the results for N-agents. We showed that a pair of (1partial) state-dependent time-threshold policies form a Nash equilibrium.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121675181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919828
Adithya M. Devraj, A. Bušić, Sean P. Meyn
Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak’s momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov’s acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by $sigma^{2}/n+o(1/n)$, where $sigma^{2} geq 0$ is an identifiable constant. If these conditions fail, the rate is typically sub-linear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, $sigma^{2}$: the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate $O(1/n^{2})$. The newly proposed algorithms are extended to a reinforcement learning setting to obtain new Q-learning algorithms, and numerical results confirm the coupling of PolSA and SNR.
{"title":"On Matrix Momentum Stochastic Approximation and Applications to Q-learning","authors":"Adithya M. Devraj, A. Bušić, Sean P. Meyn","doi":"10.1109/ALLERTON.2019.8919828","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919828","url":null,"abstract":"Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak’s momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov’s acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by $sigma^{2}/n+o(1/n)$, where $sigma^{2} geq 0$ is an identifiable constant. If these conditions fail, the rate is typically sub-linear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, $sigma^{2}$: the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate $O(1/n^{2})$. The newly proposed algorithms are extended to a reinforcement learning setting to obtain new Q-learning algorithms, and numerical results confirm the coupling of PolSA and SNR.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130627511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919933
D. Shah, C. Yu
We consider the task of tensor estimation, i.e. estimating a low-rank 3-order $n times n times n$ tensor from noisy observations of randomly chosen entries in the sparse regime. In the context of matrix (2-order tensor) estimation, a variety of algorithms have been proposed and analyzed in the literature including the popular collaborative filtering algorithm that is extremely well utilized in practice. However, in the context of tensor estimation, there is limited progress. No natural extensions of collaborative filtering are known beyond “flattening” the tensor into a matrix and applying standard collaborative filtering. As the main contribution of this work, we introduce a generalization of the collaborative filtering algorithm for the setting of tensor estimation and argue that it achieves sample complexity that (nearly) matches the conjectured lower bound on the sample complexity. Interestingly, our generalization uses the matrix obtained from the “flattened” tensor to compute similarity as in the classical collaborative filtering but by defining a novel “graph” using it. The algorithm recovers the tensor with mean-squared-error (MSE) decaying to 0 as long as each entry is observed independently with probability $p= Omega(n^{-3/2+epsilon})$ for any arbitrarily small $epsilon > 0$. It turns out that $p = Omega(n^{-3/2})$ is the conjectured lower bound as well as “connectivity threshold” of graph considered to compute similarity in our algorithm.
我们考虑张量估计的任务,即从稀疏区域中随机选择的条目的噪声观测中估计一个低秩3阶张量$n times n times n$。在矩阵(二阶张量)估计的背景下,文献中已经提出并分析了多种算法,其中包括在实践中得到很好应用的流行的协同滤波算法。然而,在张量估计方面,进展有限。除了将张量“扁平化”为矩阵并应用标准协同过滤之外,还没有已知的协同过滤的自然扩展。作为这项工作的主要贡献,我们引入了一种用于张量估计设置的协同滤波算法的推广,并认为它实现的样本复杂度(几乎)匹配样本复杂度的推测下界。有趣的是,我们的推广使用从“扁平”张量中获得的矩阵来计算相似度,就像在经典的协同过滤中一样,但通过定义一个新的“图”来使用它。该算法恢复张量,均方误差(MSE)衰减到0,只要每个条目以任意小$epsilon > 0$的概率$p= Omega(n^{-3/2+epsilon})$独立观察。结果表明,$p = Omega(n^{-3/2})$是我们算法中计算相似度所考虑的图的推测下界和“连通性阈值”。
{"title":"Iterative Collaborative Filtering for Sparse Noisy Tensor Estimation","authors":"D. Shah, C. Yu","doi":"10.1109/ALLERTON.2019.8919933","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919933","url":null,"abstract":"We consider the task of tensor estimation, i.e. estimating a low-rank 3-order $n times n times n$ tensor from noisy observations of randomly chosen entries in the sparse regime. In the context of matrix (2-order tensor) estimation, a variety of algorithms have been proposed and analyzed in the literature including the popular collaborative filtering algorithm that is extremely well utilized in practice. However, in the context of tensor estimation, there is limited progress. No natural extensions of collaborative filtering are known beyond “flattening” the tensor into a matrix and applying standard collaborative filtering. As the main contribution of this work, we introduce a generalization of the collaborative filtering algorithm for the setting of tensor estimation and argue that it achieves sample complexity that (nearly) matches the conjectured lower bound on the sample complexity. Interestingly, our generalization uses the matrix obtained from the “flattened” tensor to compute similarity as in the classical collaborative filtering but by defining a novel “graph” using it. The algorithm recovers the tensor with mean-squared-error (MSE) decaying to 0 as long as each entry is observed independently with probability $p= Omega(n^{-3/2+epsilon})$ for any arbitrarily small $epsilon > 0$. It turns out that $p = Omega(n^{-3/2})$ is the conjectured lower bound as well as “connectivity threshold” of graph considered to compute similarity in our algorithm.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130445710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919653
Karthik Nagarjuna Tunuguntla, P. Siegel
We consider the source coding problem of a binary discrete memoryless source with correlated side information available only at the receiver whose conditional distribution given the source is unknown to the encoder. We propose two methods based on polar codes to attain the achievable rates under this setting. The first method incorporates a staircase scheme, which has been used for universal polar coding for a compound channel. The second method is based on the technique of universalization using bit-channel combining. We also give a list of pros and cons for the two proposed methods.
{"title":"Slepian-Wolf Polar Coding with Unknown Correlation","authors":"Karthik Nagarjuna Tunuguntla, P. Siegel","doi":"10.1109/ALLERTON.2019.8919653","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919653","url":null,"abstract":"We consider the source coding problem of a binary discrete memoryless source with correlated side information available only at the receiver whose conditional distribution given the source is unknown to the encoder. We propose two methods based on polar codes to attain the achievable rates under this setting. The first method incorporates a staircase scheme, which has been used for universal polar coding for a compound channel. The second method is based on the technique of universalization using bit-channel combining. We also give a list of pros and cons for the two proposed methods.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130606570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919887
Dengwang Tang, V. Subramanian
Balls-in-bins model, in which n balls are sequentially placed into n bins according to some dispatching policy, is an important model with a wide range of applications despite its simplicity. The power-of-d choices (Pod) policy, in which each ball samples d independent uniform random bins and join the one with the least load (where ties are broken arbitrarily), can yield a maximum load of $frac{loglog n}{log d} + Theta(1)$ with high probability whenever $dgeq 2$. Vöking later proposed a variant of power-of-d scheme in which bins are divided into d groups, and d bins are sampled from each group respectively. One important feature of this scheme is that ties are broken asymmetrically based on groups. Comparing with Pod, this scheme reduces the maximum load to $frac{loglog n}{dlogphi_{d}}+Theta(1)$ where $1 lt phi_{d} lt 2$. Our recent work shows that one can replace independent uniform sampling with random walk based sampling while having the same performance of Pod in terms of the maximum load of all bins. In this work, we propose multiple derandomized variants of Vöking’s asymmetrical schemes and we show that they can yield the same performance as the original scheme, i.e. the maximum load is bounded by $frac{log log n}{d log phi_{d}}+Theta(1)$
{"title":"Derandomized Asymmetrical Balanced Allocation","authors":"Dengwang Tang, V. Subramanian","doi":"10.1109/ALLERTON.2019.8919887","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919887","url":null,"abstract":"Balls-in-bins model, in which n balls are sequentially placed into n bins according to some dispatching policy, is an important model with a wide range of applications despite its simplicity. The power-of-d choices (Pod) policy, in which each ball samples d independent uniform random bins and join the one with the least load (where ties are broken arbitrarily), can yield a maximum load of $frac{loglog n}{log d} + Theta(1)$ with high probability whenever $dgeq 2$. Vöking later proposed a variant of power-of-d scheme in which bins are divided into d groups, and d bins are sampled from each group respectively. One important feature of this scheme is that ties are broken asymmetrically based on groups. Comparing with Pod, this scheme reduces the maximum load to $frac{loglog n}{dlogphi_{d}}+Theta(1)$ where $1 lt phi_{d} lt 2$. Our recent work shows that one can replace independent uniform sampling with random walk based sampling while having the same performance of Pod in terms of the maximum load of all bins. In this work, we propose multiple derandomized variants of Vöking’s asymmetrical schemes and we show that they can yield the same performance as the original scheme, i.e. the maximum load is bounded by $frac{log log n}{d log phi_{d}}+Theta(1)$","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"66 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115781641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919772
Zai Shi, A. Eryilmaz
How to escape saddle points is a critical issue in non-convex optimization. Previous methods on this issue mainly assume that the objective function is Hessian-Lipschitz, which leave a gap for applications using non-Hessian-Lipschitz functions. In this paper, we propose Cubic Regularized Alternating Direction Method of Multipliers (CR-ADMM) to escape saddle points of separable non-convex functions containing a non-Hessian-Lipschitz component. By carefully choosing a parameter, we prove that CR-ADMM converges to a local minimum of the original function with a rate of $O(1 /T^{1/3})$ in time horizon T, which is faster than gradient-based methods. We also show that when one or more steps of CR-ADMM are not solved exactly, CRADMM can converge to a neighborhood of the local minimum. Through the experiments of matrix factorization problems, CRADMM is shown to have a faster rate and a lower optimality gap compared with other gradient-based methods. Our approach can also find applications in other scenarios where regularized non-convex cost minimization is performed, such as parameter optimization of deep neural networks.
{"title":"Cubic Regularized ADMM with Convergence to a Local Minimum in Non-convex Optimization","authors":"Zai Shi, A. Eryilmaz","doi":"10.1109/ALLERTON.2019.8919772","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919772","url":null,"abstract":"How to escape saddle points is a critical issue in non-convex optimization. Previous methods on this issue mainly assume that the objective function is Hessian-Lipschitz, which leave a gap for applications using non-Hessian-Lipschitz functions. In this paper, we propose Cubic Regularized Alternating Direction Method of Multipliers (CR-ADMM) to escape saddle points of separable non-convex functions containing a non-Hessian-Lipschitz component. By carefully choosing a parameter, we prove that CR-ADMM converges to a local minimum of the original function with a rate of $O(1 /T^{1/3})$ in time horizon T, which is faster than gradient-based methods. We also show that when one or more steps of CR-ADMM are not solved exactly, CRADMM can converge to a neighborhood of the local minimum. Through the experiments of matrix factorization problems, CRADMM is shown to have a faster rate and a lower optimality gap compared with other gradient-based methods. Our approach can also find applications in other scenarios where regularized non-convex cost minimization is performed, such as parameter optimization of deep neural networks.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123903203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919898
Ahmad Alammouri, J. Andrews, F. Baccelli
We characterize the stability, metastability, and the stationary regime of traffic dynamics in a single-cell uplink wireless system. The traffic is represented in terms of spatial birth-death processes, in which users arrive as a Poisson point process in time and space, each with a file to transmit to the base station. The service rate of each user is based on its signal to interference plus noise ratio, where the interference is from other active users in the cell. Once the file is fully transmitted, the user leaves the cell. We derive the necessary and sufficient condition for network stability, which is independent of the specific path loss function as long as it satisfies mild bound- edness conditions. A novel observation, shown through mean- field analysis and simulations, is that for a certain range of arrival rates, the network appears stable for possibly a long time, but can suddenly become unstable. This property is called metastability which is widely known in statistical physics but rarely observed in wireless communication. Finally, using mean- field analysis, we propose a heuristic characterization of the network steady-state regime when it exists, and demonstrate that it is tight for the whole range of arrival rates.
{"title":"Stability of Wireless Random Access Systems","authors":"Ahmad Alammouri, J. Andrews, F. Baccelli","doi":"10.1109/ALLERTON.2019.8919898","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919898","url":null,"abstract":"We characterize the stability, metastability, and the stationary regime of traffic dynamics in a single-cell uplink wireless system. The traffic is represented in terms of spatial birth-death processes, in which users arrive as a Poisson point process in time and space, each with a file to transmit to the base station. The service rate of each user is based on its signal to interference plus noise ratio, where the interference is from other active users in the cell. Once the file is fully transmitted, the user leaves the cell. We derive the necessary and sufficient condition for network stability, which is independent of the specific path loss function as long as it satisfies mild bound- edness conditions. A novel observation, shown through mean- field analysis and simulations, is that for a certain range of arrival rates, the network appears stable for possibly a long time, but can suddenly become unstable. This property is called metastability which is widely known in statistical physics but rarely observed in wireless communication. Finally, using mean- field analysis, we propose a heuristic characterization of the network steady-state regime when it exists, and demonstrate that it is tight for the whole range of arrival rates.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123396858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919892
Xiangxiang Xu, Shao-Lun Huang
The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semisupervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.
{"title":"On the Asymptotic Sample Complexity of HGR Maximal Correlation Functions in Semi-supervised Learning","authors":"Xiangxiang Xu, Shao-Lun Huang","doi":"10.1109/ALLERTON.2019.8919892","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919892","url":null,"abstract":"The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation has been shown useful in many machine learning applications, where the alternating conditional expectation (ACE) algorithm is widely adopted to estimate the HGR maximal correlation functions from data samples. In this paper, we consider the asymptotic sample complexity of estimating the HGR maximal correlation functions in semi-supervised learning, where both labeled and unlabeled data samples are used for the estimation. First, we propose a generalized ACE algorithm to deal with the unlabeled data samples. Then, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution and the functions estimated from the generalized ACE algorithm. We establish the analytical expressions for the error exponents of the learning errors, which indicate the number of training samples required for estimating the HGR maximal correlation functions by the generalized ACE algorithm. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semisupervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124957656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ALLERTON.2019.8919798
Mounia Hamidouche, L. Cottatellucci, Konstantin Avrachenkov
In this article, we analyze the limiting eigen-value distribution (LED) of random geometric graphs (RGGs). The RGG is constructed by uniformly distributing n nodes on the d-dimensional torus $Gamma^{d}equiv [0, 1]^{d}$ and connecting two nodes if their $ell_{p}$-distance, $ pin [1, infty]$ is at most rn. In particular, we study the LED of the adjacency matrix of RGGs in the connectivity regime, in which the average vertex degree scales as $log(n)$ or faster, i.e., $Omega(log(n))$. In the connectivity regime and under some conditions on the radius rn, we show that the LED of the adjacency matrix of RGGs converges to the LED of the adjacency matrix of a deterministic geometric graph (DGG) with nodes in a grid as n goes to infinity. Then, for n finite, we use the structure of the DGG to approximate the eigenvalues of the adjacency matrix of the RGG and provide an upper bound for the approximation error. Index Terms--Random geometric graphs, adjacency matrix, limiting eigenvalue distribution, Levy distance.
{"title":"Spectral Analysis of the Adjacency Matrix of Random Geometric Graphs","authors":"Mounia Hamidouche, L. Cottatellucci, Konstantin Avrachenkov","doi":"10.1109/ALLERTON.2019.8919798","DOIUrl":"https://doi.org/10.1109/ALLERTON.2019.8919798","url":null,"abstract":"In this article, we analyze the limiting eigen-value distribution (LED) of random geometric graphs (RGGs). The RGG is constructed by uniformly distributing n nodes on the d-dimensional torus $Gamma^{d}equiv [0, 1]^{d}$ and connecting two nodes if their $ell_{p}$-distance, $ pin [1, infty]$ is at most rn. In particular, we study the LED of the adjacency matrix of RGGs in the connectivity regime, in which the average vertex degree scales as $log(n)$ or faster, i.e., $Omega(log(n))$. In the connectivity regime and under some conditions on the radius rn, we show that the LED of the adjacency matrix of RGGs converges to the LED of the adjacency matrix of a deterministic geometric graph (DGG) with nodes in a grid as n goes to infinity. Then, for n finite, we use the structure of the DGG to approximate the eigenvalues of the adjacency matrix of the RGG and provide an upper bound for the approximation error. Index Terms--Random geometric graphs, adjacency matrix, limiting eigenvalue distribution, Levy distance.","PeriodicalId":120479,"journal":{"name":"2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128785925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}