首页 > 最新文献

Journal of Business & Economic Statistics最新文献

英文 中文
A robust approach to heteroskedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data 大型面板数据中具有交互效应的线性模型中的异方差、误差序列相关和斜率异质性的稳健方法
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-05-13 DOI: 10.1080/07350015.2022.2077349
Guowei Cui, Kazuhiko Hayakawa, Shuichi Nagata, Takashi Yamagata
Abstract In this article, we propose a robust approach against heteroscedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data. First, consistency and asymptotic normality of the pooled iterated principal component (IPC) estimator for random coefficient and homogeneous slope models are established. Then, we prove the asymptotic validity of the associated Wald test for slope parameter restrictions based on the panel heteroscedasticity and autocorrelation consistent (PHAC) variance matrix estimator for both random coefficient and homogeneous slope models, which does not require the Newey-West type time-series parameter truncation. These results asymptotically justify the use of the same pooled IPC estimator and the PHAC standard error for both homogeneous-slope and heterogeneous-slope models. This robust approach can significantly reduce the model selection uncertainty for applied researchers. In addition, we propose a Lagrange Multiplier (LM) test for correlated random coefficients with covariates. This test has nontrivial power against correlated random coefficients, but not for random coefficients and homogeneous slopes. The LM test is important because the IPC estimator becomes inconsistent with correlated random coefficients. The finite sample evidence and an empirical application support the reliability and the usefulness of our robust approach.
{"title":"A robust approach to heteroskedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data","authors":"Guowei Cui, Kazuhiko Hayakawa, Shuichi Nagata, Takashi Yamagata","doi":"10.1080/07350015.2022.2077349","DOIUrl":"https://doi.org/10.1080/07350015.2022.2077349","url":null,"abstract":"Abstract In this article, we propose a robust approach against heteroscedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data. First, consistency and asymptotic normality of the pooled iterated principal component (IPC) estimator for random coefficient and homogeneous slope models are established. Then, we prove the asymptotic validity of the associated Wald test for slope parameter restrictions based on the panel heteroscedasticity and autocorrelation consistent (PHAC) variance matrix estimator for both random coefficient and homogeneous slope models, which does not require the Newey-West type time-series parameter truncation. These results asymptotically justify the use of the same pooled IPC estimator and the PHAC standard error for both homogeneous-slope and heterogeneous-slope models. This robust approach can significantly reduce the model selection uncertainty for applied researchers. In addition, we propose a Lagrange Multiplier (LM) test for correlated random coefficients with covariates. This test has nontrivial power against correlated random coefficients, but not for random coefficients and homogeneous slopes. The LM test is important because the IPC estimator becomes inconsistent with correlated random coefficients. The finite sample evidence and an empirical application support the reliability and the usefulness of our robust approach.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59995703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Singular Conditional Autoregressive Wishart Model for Realized Covariance Matrices 已实现协方差矩阵的奇异条件自回归Wishart模型
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-05-11 DOI: 10.1080/07350015.2022.2075370
Gustav Alfelt, Taras Bodnar, F. Javed, J. Tyrcha
Abstract Realized covariance matrices are often constructed under the assumption that richness of intra-day return data is greater than the portfolio size, resulting in nonsingular matrix measures. However, when for example the portfolio size is large, assets suffer from illiquidity issues, or market microstructure noise deters sampling on very high frequencies, this relation is not guaranteed. Under these common conditions, realized covariance matrices may obtain as singular by construction. Motivated by this situation, we introduce the Singular Conditional Autoregressive Wishart (SCAW) model to capture the temporal dynamics of time series of singular realized covariance matrices, extending the rich literature on econometric Wishart time series models to the singular case. This model is furthermore developed by covariance targeting adapted to matrices and a sector wise BEKK-specification, allowing excellent scalability to large and extremely large portfolio sizes. Finally, the model is estimated to a 20-year long time series containing 50 stocks and to a 10-year long time series containing 300 stocks, and evaluated using out-of-sample forecast accuracy. It outperforms the benchmark models with high statistical significance and the parsimonious specifications perform better than the baseline SCAW model, while using considerably less parameters.
摘要实现协方差矩阵通常是在假设日内收益数据的丰富度大于投资组合规模的情况下构建的,从而产生非奇异矩阵度量。然而,例如,当投资组合规模较大,资产存在流动性问题,或者市场微观结构噪音阻碍了高频采样时,这种关系是不可保证的。在这些常见条件下,实现的协方差矩阵可以通过构造获得奇异性。基于这种情况,我们引入了奇异条件自回归Wishart(SCAW)模型来捕捉奇异实现协方差矩阵的时间序列的时间动力学,将计量经济学Wishart时间序列模型的丰富文献扩展到奇异情况。该模型通过适用于矩阵的协方差目标和扇区BEKK规范进一步开发,允许对大型和超大投资组合规模进行出色的可扩展性。最后,将该模型估计为包含50只股票的20年长时间序列和包含300只股票的10年长时间系列,并使用样本外预测精度进行评估。它优于具有高统计显著性的基准模型,并且简约规范的性能优于基线SCAW模型,同时使用的参数要少得多。
{"title":"Singular Conditional Autoregressive Wishart Model for Realized Covariance Matrices","authors":"Gustav Alfelt, Taras Bodnar, F. Javed, J. Tyrcha","doi":"10.1080/07350015.2022.2075370","DOIUrl":"https://doi.org/10.1080/07350015.2022.2075370","url":null,"abstract":"Abstract Realized covariance matrices are often constructed under the assumption that richness of intra-day return data is greater than the portfolio size, resulting in nonsingular matrix measures. However, when for example the portfolio size is large, assets suffer from illiquidity issues, or market microstructure noise deters sampling on very high frequencies, this relation is not guaranteed. Under these common conditions, realized covariance matrices may obtain as singular by construction. Motivated by this situation, we introduce the Singular Conditional Autoregressive Wishart (SCAW) model to capture the temporal dynamics of time series of singular realized covariance matrices, extending the rich literature on econometric Wishart time series models to the singular case. This model is furthermore developed by covariance targeting adapted to matrices and a sector wise BEKK-specification, allowing excellent scalability to large and extremely large portfolio sizes. Finally, the model is estimated to a 20-year long time series containing 50 stocks and to a 10-year long time series containing 300 stocks, and evaluated using out-of-sample forecast accuracy. It outperforms the benchmark models with high statistical significance and the parsimonious specifications perform better than the baseline SCAW model, while using considerably less parameters.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"833 - 845"},"PeriodicalIF":3.0,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42036077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Network Gradient Descent Algorithm for Decentralized Federated Learning 分散联合学习的网络梯度下降算法
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-05-06 DOI: 10.1080/07350015.2022.2074426
Shuyuan Wu, Danyang Huang, Hansheng Wang
Abstract We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator’s statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is weakly balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.
摘要我们研究了一种完全分散的联邦学习算法,这是一种在基于通信的网络上执行的新型梯度下降算法。为了方便起见,我们将其称为网络梯度下降(NGD)方法。在NGD方法中,只需要传达统计数据(例如参数估计),从而将隐私风险降至最低。同时,不同的客户端根据精心设计的网络结构直接相互通信,而无需中央主机。这大大提高了整个算法的可靠性。这些良好的性质激励我们从理论和数值上仔细研究NGD方法。从理论上讲,我们从一个经典的线性回归模型开始。我们发现,学习率和网络结构在决定NGD估计器的统计效率方面都起着重要作用。如果学习率足够小并且网络结构弱平衡,即使数据分布不均匀,所得到的NGD估计器在统计上也可以与全局估计器一样有效。然后将这些有趣的发现推广到一般模型和损失函数中。大量的数值研究证实了我们的理论发现。为了便于说明,还提出了经典的深度学习模型。
{"title":"Network Gradient Descent Algorithm for Decentralized Federated Learning","authors":"Shuyuan Wu, Danyang Huang, Hansheng Wang","doi":"10.1080/07350015.2022.2074426","DOIUrl":"https://doi.org/10.1080/07350015.2022.2074426","url":null,"abstract":"Abstract We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator’s statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is weakly balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"806 - 818"},"PeriodicalIF":3.0,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45116396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Estimation of Panel Data Models with Random Interactive Effects and Multiple Structural Breaks when T is Fixed T固定时具有随机交互效应和多个结构断裂的面板数据模型的估计
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-22 DOI: 10.1080/07350015.2022.2067546
Y. Kaddoura, J. Westerlund
Abstract In this article, we propose a new estimator of panel data models with random interactive effects and multiple structural breaks that is suitable when the number of time periods, T, is fixed and only the number of cross-sectional units, N, is large. This is done by viewing the determination of the breaks as a shrinkage problem, and to estimate both the regression coefficients, and the number of breaks and their locations by applying a version of the Lasso approach. We show that with probability approaching one the approach can correctly determine the number of breaks and the dates of these breaks, and that the estimator of the regime-specific regression coefficients is consistent and asymptotically normal. We also provide Monte Carlo results suggesting that the approach performs very well in small samples, and empirical results suggesting that while the coefficients of the controls are breaking, the coefficients of the main deterrence regressors in a model of crime are not.
摘要在本文中,我们提出了一种具有随机交互效应和多个结构断裂的面板数据模型的新估计量,该估计量适用于时间段数T固定且只有横截面单元数N大的情况。这是通过将断裂的确定视为收缩问题来完成的,并通过应用Lasso方法的版本来估计回归系数、断裂数量及其位置。我们证明,当概率接近1时,该方法可以正确地确定中断的次数和中断的日期,并且特定状态回归系数的估计量是一致的和渐近正态的。我们还提供了蒙特卡洛结果,表明该方法在小样本中表现良好,经验结果表明,虽然控制系数正在打破,但犯罪模型中主要威慑回归因子的系数却没有。
{"title":"Estimation of Panel Data Models with Random Interactive Effects and Multiple Structural Breaks when T is Fixed","authors":"Y. Kaddoura, J. Westerlund","doi":"10.1080/07350015.2022.2067546","DOIUrl":"https://doi.org/10.1080/07350015.2022.2067546","url":null,"abstract":"Abstract In this article, we propose a new estimator of panel data models with random interactive effects and multiple structural breaks that is suitable when the number of time periods, T, is fixed and only the number of cross-sectional units, N, is large. This is done by viewing the determination of the breaks as a shrinkage problem, and to estimate both the regression coefficients, and the number of breaks and their locations by applying a version of the Lasso approach. We show that with probability approaching one the approach can correctly determine the number of breaks and the dates of these breaks, and that the estimator of the regime-specific regression coefficients is consistent and asymptotically normal. We also provide Monte Carlo results suggesting that the approach performs very well in small samples, and empirical results suggesting that while the coefficients of the controls are breaking, the coefficients of the main deterrence regressors in a model of crime are not.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"778 - 790"},"PeriodicalIF":3.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48853636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Combining p-values for Multivariate Predictive Ability Testing 组合p值进行多元预测能力测试
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-19 DOI: 10.1080/07350015.2022.2067545
Lars Spreng, G. Urga
Abstract In this article, we propose an intersection-union test for multivariate forecast accuracy based on the combination of a sequence of univariate tests. The testing framework evaluates a global null hypothesis of equal predictive ability using any number of univariate forecast accuracy tests under arbitrary dependence structures, without specifying the underlying multivariate distribution. An extensive Monte Carlo simulation exercise shows that our proposed test has very good size and power properties under several relevant scenarios, and performs well in both low- and high-dimensional settings. We illustrate the empirical validity of our testing procedure using a large dataset of 84 daily exchange rates running from January 1, 2011 to April 1, 2021. We show that our proposed test addresses inconclusive results that often arise in practice.
摘要在本文中,我们提出了一种基于单变量检验序列组合的多变量预测精度的交并检验。该测试框架在任意依赖结构下使用任意数量的单变量预测准确性测试来评估具有相等预测能力的全局零假设,而不指定潜在的多变量分布。广泛的蒙特卡洛模拟实践表明,我们提出的测试在几个相关场景下具有非常好的尺寸和功率特性,并且在低维和高维环境中都表现良好。我们使用2011年1月1日至2021年4月1日的84个每日汇率的大型数据集来说明我们的测试程序的实证有效性。我们表明,我们提出的测试解决了实践中经常出现的不确定结果。
{"title":"Combining p-values for Multivariate Predictive Ability Testing","authors":"Lars Spreng, G. Urga","doi":"10.1080/07350015.2022.2067545","DOIUrl":"https://doi.org/10.1080/07350015.2022.2067545","url":null,"abstract":"Abstract In this article, we propose an intersection-union test for multivariate forecast accuracy based on the combination of a sequence of univariate tests. The testing framework evaluates a global null hypothesis of equal predictive ability using any number of univariate forecast accuracy tests under arbitrary dependence structures, without specifying the underlying multivariate distribution. An extensive Monte Carlo simulation exercise shows that our proposed test has very good size and power properties under several relevant scenarios, and performs well in both low- and high-dimensional settings. We illustrate the empirical validity of our testing procedure using a large dataset of 84 daily exchange rates running from January 1, 2011 to April 1, 2021. We show that our proposed test addresses inconclusive results that often arise in practice.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"765 - 777"},"PeriodicalIF":3.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44254877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Breaks in Grouped Heterogeneity 群体异质性中的结构断裂
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-08 DOI: 10.1080/07350015.2022.2063132
Simon C. Smith
Abstract Generating accurate forecasts in the presence of structural breaks requires careful management of bias-variance tradeoffs. Forecasting panel data under breaks offers the possibility to reduce parameter estimation error without inducing any bias if there exists a regime-specific pattern of grouped heterogeneity. To this end, we develop a new Bayesian methodology to estimate and formally test panel regression models in the presence of multiple breaks and unobserved regime-specific grouped heterogeneity. In an empirical application to forecasting inflation rates across 20 U.S. industries, our method generates significantly more accurate forecasts relative to a range of popular methods.
摘要在存在结构断裂的情况下生成准确的预测需要仔细管理偏差-方差权衡。如果存在特定于制度的分组异质性模式,则预测间断下的面板数据提供了在不引起任何偏差的情况下减少参数估计误差的可能性。为此,我们开发了一种新的贝叶斯方法,在存在多重中断和未观察到的特定于制度的分组异质性的情况下,估计并正式测试面板回归模型。在预测20年通货膨胀率的实证应用中 与一系列流行的方法相比,我们的方法在美国工业中产生了更准确的预测。
{"title":"Structural Breaks in Grouped Heterogeneity","authors":"Simon C. Smith","doi":"10.1080/07350015.2022.2063132","DOIUrl":"https://doi.org/10.1080/07350015.2022.2063132","url":null,"abstract":"Abstract Generating accurate forecasts in the presence of structural breaks requires careful management of bias-variance tradeoffs. Forecasting panel data under breaks offers the possibility to reduce parameter estimation error without inducing any bias if there exists a regime-specific pattern of grouped heterogeneity. To this end, we develop a new Bayesian methodology to estimate and formally test panel regression models in the presence of multiple breaks and unobserved regime-specific grouped heterogeneity. In an empirical application to forecasting inflation rates across 20 U.S. industries, our method generates significantly more accurate forecasts relative to a range of popular methods.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"752 - 764"},"PeriodicalIF":3.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48059904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Rejoinder: “Co-citation and Co-authorship Networks of Statisticians” 复辩状:“统计学家的共同引用和合作网络”
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2055358
Pengsheng Ji, Jiashun Jin, Z. Ke, Wanshan Li
We thank David Donoho for very encouraging comments. As always, his penetrating vision and deep thoughts are extremely stimulating. We are glad that he summarizes a major philosophical difference between statistics in earlier years (e.g., the time of Francis Galton) and statistics in our time by just a few words: data-first versus model-first. We completely agree with his comment that “each effort by a statistics researcher to understand a newly available type of data enlarges our field; it should be a primary part of the career of statisticians to cultivate an interest in cultivating new types of datasets, so that new methodology can be discovered and developed”; these are exactly the motivations underlying our (several-year) efforts in collecting, cleaning, and analyzing a large-scale high-quality dataset. We would like to add that both traditions have strengths, and combining the strengths of two sides may greatly help statisticians deal with the so-called crisis of the 21st century in statistics we face today. Let us explain the crisis above first. In the model-first tradition, with a particular application problem in mind, we propose a model, develop a method and justify its optimality by some hard-to-prove theorems, and find a dataset to support the approach. In this tradition, we put a lot of faith on our model and our theory: we hope the model is adequate, and we hope our optimality theory warrants the superiority of our method over others. Modern machine learning literature (especially the recent development of deep learning) provides a different approach to justifying the “superiority” of an approach; we compare the proposed approach with existing approaches by the real data results over a dozen of benchmark datasets. To choose an algorithm for their dataset, a practitioner does not necessarily need warranties from a theorem; a superior performance over many benchmark datasets says it all. To some theoretical statisticians, this is rather disappointing, as they come from a long
我们感谢大卫·多诺霍非常鼓舞人心的评论。一如既往,他锐利的眼光和深邃的思想极具启发性。我们很高兴他总结了早期统计(例如,弗朗西斯·高尔顿的时代)和我们这个时代的统计之间的主要哲学差异,只有几个字:数据优先与模型优先。我们完全同意他的评论:“统计研究人员为理解一种新的可用数据类型所做的每一次努力都扩大了我们的研究领域;培养培养新型数据集的兴趣应该成为统计学家职业生涯的一个主要部分,这样才能发现和发展新的方法”;这些正是我们(数年)努力收集、清理和分析大规模高质量数据集的动机。我们想补充的是,这两种传统都有各自的优势,将双方的优势结合起来,可能会极大地帮助统计学家应对我们今天面临的所谓21世纪统计危机。让我们先解释一下上述危机。在模型优先的传统中,考虑到特定的应用问题,我们提出了一个模型,开发了一种方法,并通过一些难以证明的定理来证明其最优性,并找到一个数据集来支持该方法。在这个传统中,我们对我们的模型和理论有很大的信心:我们希望模型是足够的,我们希望我们的最优性理论保证我们的方法优于其他方法。现代机器学习文献(尤其是深度学习的最新发展)提供了一种不同的方法来证明一种方法的“优越性”;我们通过十几个基准数据集的真实数据结果将所提出的方法与现有方法进行了比较。为了为他们的数据集选择一种算法,从业者不一定需要定理的保证;优于许多基准数据集的优越性能说明了一切。对于一些理论统计学家来说,这是相当令人失望的,因为他们来自一个漫长的
{"title":"Rejoinder: “Co-citation and Co-authorship Networks of Statisticians”","authors":"Pengsheng Ji, Jiashun Jin, Z. Ke, Wanshan Li","doi":"10.1080/07350015.2022.2055358","DOIUrl":"https://doi.org/10.1080/07350015.2022.2055358","url":null,"abstract":"We thank David Donoho for very encouraging comments. As always, his penetrating vision and deep thoughts are extremely stimulating. We are glad that he summarizes a major philosophical difference between statistics in earlier years (e.g., the time of Francis Galton) and statistics in our time by just a few words: data-first versus model-first. We completely agree with his comment that “each effort by a statistics researcher to understand a newly available type of data enlarges our field; it should be a primary part of the career of statisticians to cultivate an interest in cultivating new types of datasets, so that new methodology can be discovered and developed”; these are exactly the motivations underlying our (several-year) efforts in collecting, cleaning, and analyzing a large-scale high-quality dataset. We would like to add that both traditions have strengths, and combining the strengths of two sides may greatly help statisticians deal with the so-called crisis of the 21st century in statistics we face today. Let us explain the crisis above first. In the model-first tradition, with a particular application problem in mind, we propose a model, develop a method and justify its optimality by some hard-to-prove theorems, and find a dataset to support the approach. In this tradition, we put a lot of faith on our model and our theory: we hope the model is adequate, and we hope our optimality theory warrants the superiority of our method over others. Modern machine learning literature (especially the recent development of deep learning) provides a different approach to justifying the “superiority” of an approach; we compare the proposed approach with existing approaches by the real data results over a dozen of benchmark datasets. To choose an algorithm for their dataset, a practitioner does not necessarily need warranties from a theorem; a superior performance over many benchmark datasets says it all. To some theoretical statisticians, this is rather disappointing, as they come from a long","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"40 1","pages":"499 - 504"},"PeriodicalIF":3.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41393404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Discussion of “Cocitation and Coauthorship Networks of Statisticians” 关于“统计学家合著网络”的探讨
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2037432
Haolei Weng, Yang Feng
Abstract We congratulate the authors for their stimulating and thought-provoking work on network data analysis. In the article, the authors not only introduce a new large-scale and high-quality publication dataset that will surely become an important benchmark for further network research, but also present novel statistical methods and modeling which lead to very interesting findings about the statistics community. There is much material for thought and exploration. In this discussion, we will focus on the cocitation networks, and discuss a few points for the coauthorship networks toward the end.
我们祝贺作者在网络数据分析方面所做的启发性和发人深省的工作。在文章中,作者不仅介绍了一个新的大规模和高质量的出版物数据集,这必将成为进一步网络研究的重要基准,而且还提出了新的统计方法和建模,这导致了统计界非常有趣的发现。有很多材料值得思考和探索。在这次讨论中,我们将集中讨论共同作者网络,并在最后讨论共同作者网络的几个要点。
{"title":"Discussion of “Cocitation and Coauthorship Networks of Statisticians”","authors":"Haolei Weng, Yang Feng","doi":"10.1080/07350015.2022.2037432","DOIUrl":"https://doi.org/10.1080/07350015.2022.2037432","url":null,"abstract":"Abstract We congratulate the authors for their stimulating and thought-provoking work on network data analysis. In the article, the authors not only introduce a new large-scale and high-quality publication dataset that will surely become an important benchmark for further network research, but also present novel statistical methods and modeling which lead to very interesting findings about the statistics community. There is much material for thought and exploration. In this discussion, we will focus on the cocitation networks, and discuss a few points for the coauthorship networks toward the end.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"40 1","pages":"486 - 490"},"PeriodicalIF":3.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48242562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Discussion of “Co-citation and Co-authorship Networks of Statisticians” by Pengsheng Ji, Jiashun Jin, Zheng Tracy Ke, and Wanshan Li 纪鹏生、金家顺、柯郑翠、李万山对“统计学家共引合著网络”的探讨
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2041423
Peter Macdonald, E. Levina, Ji Zhu
We congratulate the authors on an interesting paper and on making an important contribution to the network analysis community through compiling a large new dataset which will spur further work on multilayer, dynamic and other complex network settings. This discussion focuses on the paper’s particular methods and applications in dynamic network analysis. Complexity of dynamic network data leads to many necessary analyst choices in both data processing and network modeling. Where possible, we will compare the choices made in this paper with other possibilities from recent literature on dynamic network analysis. One of the important points of the paper is that much of our network data has always been dynamic. For instance, communication networks consisting of sent and received E-mails come with time stamps, whether we choose to incorporate them or not. Developing statistical methods that take advantage of this time varying structure will lead to greater efficiency, novel insights, and generally allow us to take full advantage of rich modern datasets like the one featured in this paper.
我们祝贺作者发表了一篇有趣的论文,并对网络分析社区做出了重要贡献,他们编纂了一个大型的新数据集,这将促进对多层、动态和其他复杂网络设置的进一步研究。本文着重讨论了本文在动态网络分析中的具体方法和应用。动态网络数据的复杂性导致分析人员在数据处理和网络建模方面有许多必要的选择。在可能的情况下,我们将把本文所做的选择与近期动态网络分析文献中的其他可能性进行比较。本文的重点之一是我们的网络数据一直是动态的。例如,由发送和接收电子邮件组成的通信网络带有时间戳,无论我们是否选择合并它们。开发利用这种时变结构的统计方法将带来更高的效率,新的见解,并且通常允许我们充分利用丰富的现代数据集,如本文所述的数据集。
{"title":"Discussion of “Co-citation and Co-authorship Networks of Statisticians” by Pengsheng Ji, Jiashun Jin, Zheng Tracy Ke, and Wanshan Li","authors":"Peter Macdonald, E. Levina, Ji Zhu","doi":"10.1080/07350015.2022.2041423","DOIUrl":"https://doi.org/10.1080/07350015.2022.2041423","url":null,"abstract":"We congratulate the authors on an interesting paper and on making an important contribution to the network analysis community through compiling a large new dataset which will spur further work on multilayer, dynamic and other complex network settings. This discussion focuses on the paper’s particular methods and applications in dynamic network analysis. Complexity of dynamic network data leads to many necessary analyst choices in both data processing and network modeling. Where possible, we will compare the choices made in this paper with other possibilities from recent literature on dynamic network analysis. One of the important points of the paper is that much of our network data has always been dynamic. For instance, communication networks consisting of sent and received E-mails come with time stamps, whether we choose to incorporate them or not. Developing statistical methods that take advantage of this time varying structure will lead to greater efficiency, novel insights, and generally allow us to take full advantage of rich modern datasets like the one featured in this paper.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"40 1","pages":"492 - 493"},"PeriodicalIF":3.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41739556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Come First: Discussion of “Co-citation and Co-authorship Networks of Statisticians” 数据至上:“统计学家的共同引用和合作网络”讨论
IF 3 2区 数学 Q1 ECONOMICS Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2055356
D. Donoho
I salute the authors for their gift to the world of this new dataset! They have clearly invested plenty of time, effort, and IQ points in the study of the statistics literature as a bibliometric laboratory, and our field will grow and develop because of this dataset, as well as methodology the authors developed and/or fine-tuned with those data. Strikingly, the article also conveys a great deal of enthusiasm for the data! This seems such a departure from the pattern of many articles in statistics today. The enthusiastic spirit reminds me of some classic work by great figures in the history of statistics, who often were fascinated by new kinds of data which were just becoming available in their day, and who were inspired by the new data to invent fundamental new statistical tools and mathematical machinery. Francis Galton was interested in the relationships between father’s height and son’s height, himself compiling an extensive bivariate dataset of such heights, leading to the invention of the bivariate normal distribution and the correlation coefficient. Time and time again, new types of data came first, new types of models and methodology later. Indeed, this seems almost inevitable. As new technologies come onstream, new kinds of measurements become available, and new settings for data analysis and statistical inference emerge. This is plain to see in recent decades, where computational biology produced gene expression data, DNA sequence data, SNP data, and RNA-Seq data, each new data type leading to interesting methodological challenges and scientific progress. For me, each effort by a statistics researcher to understand a newly available type of data enlarges our field; it should be a primary part of the career of statisticians to cultivate an interest in cultivating new types of datasets, so that new methodology can be discovered and developed.
我向作者们向这个新数据集的世界致敬!他们显然已经投入了大量的时间、精力和智商,作为一个文献计量学实验室来研究统计文献,我们的领域将因为这个数据集以及作者开发和/或对这些数据进行微调的方法而成长和发展。引人注目的是,这篇文章还表达了对数据的极大热情!这似乎与当今许多统计学文章的模式大相径庭。这种热情的精神让我想起了统计史上一些伟大人物的经典作品,他们经常被他们那个时代刚刚出现的新数据所吸引,并受到新数据的启发,发明了基本的新统计工具和数学机制。弗朗西斯·高尔顿对父亲身高和儿子身高之间的关系很感兴趣,他自己编制了一个关于这种身高的广泛的二元数据集,从而发明了二元正态分布和相关系数。一次又一次,新类型的数据先出现,然后是新类型的模型和方法。事实上,这似乎是不可避免的。随着新技术的出现,新的测量方法变得可行,数据分析和统计推断的新设置也出现了。这在最近几十年显而易见,计算生物学产生了基因表达数据、DNA序列数据、SNP数据和RNA-Seq数据,每一种新的数据类型都带来了有趣的方法论挑战和科学进步。对我来说,统计研究者为理解一种新的可用数据类型所做的每一次努力都扩大了我们的研究领域;培养培养新型数据集的兴趣应该是统计学家职业生涯的主要部分,这样才能发现和开发新的方法。
{"title":"Data Come First: Discussion of “Co-citation and Co-authorship Networks of Statisticians”","authors":"D. Donoho","doi":"10.1080/07350015.2022.2055356","DOIUrl":"https://doi.org/10.1080/07350015.2022.2055356","url":null,"abstract":"I salute the authors for their gift to the world of this new dataset! They have clearly invested plenty of time, effort, and IQ points in the study of the statistics literature as a bibliometric laboratory, and our field will grow and develop because of this dataset, as well as methodology the authors developed and/or fine-tuned with those data. Strikingly, the article also conveys a great deal of enthusiasm for the data! This seems such a departure from the pattern of many articles in statistics today. The enthusiastic spirit reminds me of some classic work by great figures in the history of statistics, who often were fascinated by new kinds of data which were just becoming available in their day, and who were inspired by the new data to invent fundamental new statistical tools and mathematical machinery. Francis Galton was interested in the relationships between father’s height and son’s height, himself compiling an extensive bivariate dataset of such heights, leading to the invention of the bivariate normal distribution and the correlation coefficient. Time and time again, new types of data came first, new types of models and methodology later. Indeed, this seems almost inevitable. As new technologies come onstream, new kinds of measurements become available, and new settings for data analysis and statistical inference emerge. This is plain to see in recent decades, where computational biology produced gene expression data, DNA sequence data, SNP data, and RNA-Seq data, each new data type leading to interesting methodological challenges and scientific progress. For me, each effort by a statistics researcher to understand a newly available type of data enlarges our field; it should be a primary part of the career of statisticians to cultivate an interest in cultivating new types of datasets, so that new methodology can be discovered and developed.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"40 1","pages":"491 - 491"},"PeriodicalIF":3.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47873137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Business & Economic Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1