Journal of Business & Economic Statistics最新文献

英文中文

Detection of Multiple Structural Breaks in Large Covariance Matrices 大协方差矩阵中多个结构断裂的检测

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-05-18 DOI: 10.1080/07350015.2022.2076686

Yu-Ning Li, Degui Li, P. Fryzlewicz

ABSTRACT This article studies multiple structural breaks in large contemporaneous covariance matrices of high-dimensional time series satisfying an approximate factor model. The breaks in the second-order moment structure of the common components are due to sudden changes in either factor loadings or covariance of latent factors, requiring appropriate transformation of the factor models to facilitate estimation of the (transformed) common factors and factor loadings via the classical principal component analysis. With the estimated factors and idiosyncratic errors, an easy-to-implement CUSUM-based detection technique is introduced to consistently estimate the location and number of breaks and correctly identify whether they originate in the common or idiosyncratic error components. The algorithms of Wild Binary Segmentation for Covariance (WBS-Cov) and Wild Sparsified Binary Segmentation for Covariance (WSBS-Cov) are used to estimate breaks in the common and idiosyncratic error components, respectively. Under some technical conditions, the asymptotic properties of the proposed methodology are derived with near-optimal rates (up to a logarithmic factor) achieved for the estimated breaks. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the developed method and its comparison with other existing approaches. We finally apply our method to study the contemporaneous covariance structure of daily returns of S&P 500 constituents and identify a few breaks including those occurring during the 2007–2008 financial crisis and the recent coronavirus (COVID-19) outbreak. An package “ ” is provided to implement the proposed algorithms.

摘要本文研究了满足近似因子模型的高维时间序列的大同期协方差矩阵中的多个结构断裂。公共分量的二阶矩结构的中断是由于因子载荷或潜在因子协方差的突然变化，需要对因子模型进行适当的变换，以便于通过经典主分量分析估计（变换的）公共因子和因子载荷。利用估计的因素和特殊误差，引入了一种易于实现的基于CUSUM的检测技术，以一致地估计中断的位置和数量，并正确识别它们是起源于常见还是特殊误差分量。分别使用协方差的野生二进制分割（WBS Cov）和协方差的野生稀疏二进制分割（WSBS Cov）算法来估计常见和特殊误差分量的中断。在某些技术条件下，推导出了所提出方法的渐近性质，估计断裂达到了接近最优的速率（高达对数因子）。进行了蒙特卡罗模拟研究，以检验所开发的方法的有限样本性能，并将其与其他现有方法进行比较。最后，我们应用我们的方法来研究标准普尔500指数成分股每日收益的同期协方差结构，并确定一些中断，包括2007-2008年金融危机和最近冠状病毒（新冠肺炎）爆发期间发生的中断。提供了一个包“”来实现所提出的算法。

{"title":"Detection of Multiple Structural Breaks in Large Covariance Matrices","authors":"Yu-Ning Li, Degui Li, P. Fryzlewicz","doi":"10.1080/07350015.2022.2076686","DOIUrl":"https://doi.org/10.1080/07350015.2022.2076686","url":null,"abstract":"ABSTRACT This article studies multiple structural breaks in large contemporaneous covariance matrices of high-dimensional time series satisfying an approximate factor model. The breaks in the second-order moment structure of the common components are due to sudden changes in either factor loadings or covariance of latent factors, requiring appropriate transformation of the factor models to facilitate estimation of the (transformed) common factors and factor loadings via the classical principal component analysis. With the estimated factors and idiosyncratic errors, an easy-to-implement CUSUM-based detection technique is introduced to consistently estimate the location and number of breaks and correctly identify whether they originate in the common or idiosyncratic error components. The algorithms of Wild Binary Segmentation for Covariance (WBS-Cov) and Wild Sparsified Binary Segmentation for Covariance (WSBS-Cov) are used to estimate breaks in the common and idiosyncratic error components, respectively. Under some technical conditions, the asymptotic properties of the proposed methodology are derived with near-optimal rates (up to a logarithmic factor) achieved for the estimated breaks. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the developed method and its comparison with other existing approaches. We finally apply our method to study the contemporaneous covariance structure of daily returns of S&P 500 constituents and identify a few breaks including those occurring during the 2007–2008 financial crisis and the recent coronavirus (COVID-19) outbreak. An package “ ” is provided to implement the proposed algorithms.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"846 - 861"},"PeriodicalIF":3.0,"publicationDate":"2022-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45166385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A robust approach to heteroskedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data 大型面板数据中具有交互效应的线性模型中的异方差、误差序列相关和斜率异质性的稳健方法

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-05-13 DOI: 10.1080/07350015.2022.2077349

Guowei Cui, Kazuhiko Hayakawa, Shuichi Nagata, Takashi Yamagata

Abstract In this article, we propose a robust approach against heteroscedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data. First, consistency and asymptotic normality of the pooled iterated principal component (IPC) estimator for random coefficient and homogeneous slope models are established. Then, we prove the asymptotic validity of the associated Wald test for slope parameter restrictions based on the panel heteroscedasticity and autocorrelation consistent (PHAC) variance matrix estimator for both random coefficient and homogeneous slope models, which does not require the Newey-West type time-series parameter truncation. These results asymptotically justify the use of the same pooled IPC estimator and the PHAC standard error for both homogeneous-slope and heterogeneous-slope models. This robust approach can significantly reduce the model selection uncertainty for applied researchers. In addition, we propose a Lagrange Multiplier (LM) test for correlated random coefficients with covariates. This test has nontrivial power against correlated random coefficients, but not for random coefficients and homogeneous slopes. The LM test is important because the IPC estimator becomes inconsistent with correlated random coefficients. The finite sample evidence and an empirical application support the reliability and the usefulness of our robust approach.

{"title":"A robust approach to heteroskedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data","authors":"Guowei Cui, Kazuhiko Hayakawa, Shuichi Nagata, Takashi Yamagata","doi":"10.1080/07350015.2022.2077349","DOIUrl":"https://doi.org/10.1080/07350015.2022.2077349","url":null,"abstract":"Abstract In this article, we propose a robust approach against heteroscedasticity, error serial correlation and slope heterogeneity in linear models with interactive effects for large panel data. First, consistency and asymptotic normality of the pooled iterated principal component (IPC) estimator for random coefficient and homogeneous slope models are established. Then, we prove the asymptotic validity of the associated Wald test for slope parameter restrictions based on the panel heteroscedasticity and autocorrelation consistent (PHAC) variance matrix estimator for both random coefficient and homogeneous slope models, which does not require the Newey-West type time-series parameter truncation. These results asymptotically justify the use of the same pooled IPC estimator and the PHAC standard error for both homogeneous-slope and heterogeneous-slope models. This robust approach can significantly reduce the model selection uncertainty for applied researchers. In addition, we propose a Lagrange Multiplier (LM) test for correlated random coefficients with covariates. This test has nontrivial power against correlated random coefficients, but not for random coefficients and homogeneous slopes. The LM test is important because the IPC estimator becomes inconsistent with correlated random coefficients. The finite sample evidence and an empirical application support the reliability and the usefulness of our robust approach.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59995703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Singular Conditional Autoregressive Wishart Model for Realized Covariance Matrices 已实现协方差矩阵的奇异条件自回归Wishart模型

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-05-11 DOI: 10.1080/07350015.2022.2075370

Gustav Alfelt, Taras Bodnar, F. Javed, J. Tyrcha

Abstract Realized covariance matrices are often constructed under the assumption that richness of intra-day return data is greater than the portfolio size, resulting in nonsingular matrix measures. However, when for example the portfolio size is large, assets suffer from illiquidity issues, or market microstructure noise deters sampling on very high frequencies, this relation is not guaranteed. Under these common conditions, realized covariance matrices may obtain as singular by construction. Motivated by this situation, we introduce the Singular Conditional Autoregressive Wishart (SCAW) model to capture the temporal dynamics of time series of singular realized covariance matrices, extending the rich literature on econometric Wishart time series models to the singular case. This model is furthermore developed by covariance targeting adapted to matrices and a sector wise BEKK-specification, allowing excellent scalability to large and extremely large portfolio sizes. Finally, the model is estimated to a 20-year long time series containing 50 stocks and to a 10-year long time series containing 300 stocks, and evaluated using out-of-sample forecast accuracy. It outperforms the benchmark models with high statistical significance and the parsimonious specifications perform better than the baseline SCAW model, while using considerably less parameters.

摘要实现协方差矩阵通常是在假设日内收益数据的丰富度大于投资组合规模的情况下构建的，从而产生非奇异矩阵度量。然而，例如，当投资组合规模较大，资产存在流动性问题，或者市场微观结构噪音阻碍了高频采样时，这种关系是不可保证的。在这些常见条件下，实现的协方差矩阵可以通过构造获得奇异性。基于这种情况，我们引入了奇异条件自回归Wishart（SCAW）模型来捕捉奇异实现协方差矩阵的时间序列的时间动力学，将计量经济学Wishart时间序列模型的丰富文献扩展到奇异情况。该模型通过适用于矩阵的协方差目标和扇区BEKK规范进一步开发，允许对大型和超大投资组合规模进行出色的可扩展性。最后，将该模型估计为包含50只股票的20年长时间序列和包含300只股票的10年长时间系列，并使用样本外预测精度进行评估。它优于具有高统计显著性的基准模型，并且简约规范的性能优于基线SCAW模型，同时使用的参数要少得多。

{"title":"Singular Conditional Autoregressive Wishart Model for Realized Covariance Matrices","authors":"Gustav Alfelt, Taras Bodnar, F. Javed, J. Tyrcha","doi":"10.1080/07350015.2022.2075370","DOIUrl":"https://doi.org/10.1080/07350015.2022.2075370","url":null,"abstract":"Abstract Realized covariance matrices are often constructed under the assumption that richness of intra-day return data is greater than the portfolio size, resulting in nonsingular matrix measures. However, when for example the portfolio size is large, assets suffer from illiquidity issues, or market microstructure noise deters sampling on very high frequencies, this relation is not guaranteed. Under these common conditions, realized covariance matrices may obtain as singular by construction. Motivated by this situation, we introduce the Singular Conditional Autoregressive Wishart (SCAW) model to capture the temporal dynamics of time series of singular realized covariance matrices, extending the rich literature on econometric Wishart time series models to the singular case. This model is furthermore developed by covariance targeting adapted to matrices and a sector wise BEKK-specification, allowing excellent scalability to large and extremely large portfolio sizes. Finally, the model is estimated to a 20-year long time series containing 50 stocks and to a 10-year long time series containing 300 stocks, and evaluated using out-of-sample forecast accuracy. It outperforms the benchmark models with high statistical significance and the parsimonious specifications perform better than the baseline SCAW model, while using considerably less parameters.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"833 - 845"},"PeriodicalIF":3.0,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42036077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Network Gradient Descent Algorithm for Decentralized Federated Learning 分散联合学习的网络梯度下降算法

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-05-06 DOI: 10.1080/07350015.2022.2074426

Shuyuan Wu, Danyang Huang, Hansheng Wang

Abstract We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator’s statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is weakly balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.

摘要我们研究了一种完全分散的联邦学习算法，这是一种在基于通信的网络上执行的新型梯度下降算法。为了方便起见，我们将其称为网络梯度下降（NGD）方法。在NGD方法中，只需要传达统计数据（例如参数估计），从而将隐私风险降至最低。同时，不同的客户端根据精心设计的网络结构直接相互通信，而无需中央主机。这大大提高了整个算法的可靠性。这些良好的性质激励我们从理论和数值上仔细研究NGD方法。从理论上讲，我们从一个经典的线性回归模型开始。我们发现，学习率和网络结构在决定NGD估计器的统计效率方面都起着重要作用。如果学习率足够小并且网络结构弱平衡，即使数据分布不均匀，所得到的NGD估计器在统计上也可以与全局估计器一样有效。然后将这些有趣的发现推广到一般模型和损失函数中。大量的数值研究证实了我们的理论发现。为了便于说明，还提出了经典的深度学习模型。

{"title":"Network Gradient Descent Algorithm for Decentralized Federated Learning","authors":"Shuyuan Wu, Danyang Huang, Hansheng Wang","doi":"10.1080/07350015.2022.2074426","DOIUrl":"https://doi.org/10.1080/07350015.2022.2074426","url":null,"abstract":"Abstract We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator’s statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is weakly balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"41 1","pages":"806 - 818"},"PeriodicalIF":3.0,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45116396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Estimation of Panel Data Models with Random Interactive Effects and Multiple Structural Breaks when T is Fixed T固定时具有随机交互效应和多个结构断裂的面板数据模型的估计

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-04-22 DOI: 10.1080/07350015.2022.2067546

Y. Kaddoura, J. Westerlund

Abstract In this article, we propose a new estimator of panel data models with random interactive effects and multiple structural breaks that is suitable when the number of time periods, T, is fixed and only the number of cross-sectional units, N, is large. This is done by viewing the determination of the breaks as a shrinkage problem, and to estimate both the regression coefficients, and the number of breaks and their locations by applying a version of the Lasso approach. We show that with probability approaching one the approach can correctly determine the number of breaks and the dates of these breaks, and that the estimator of the regime-specific regression coefficients is consistent and asymptotically normal. We also provide Monte Carlo results suggesting that the approach performs very well in small samples, and empirical results suggesting that while the coefficients of the controls are breaking, the coefficients of the main deterrence regressors in a model of crime are not.

摘要在本文中，我们提出了一种具有随机交互效应和多个结构断裂的面板数据模型的新估计量，该估计量适用于时间段数T固定且只有横截面单元数N大的情况。这是通过将断裂的确定视为收缩问题来完成的，并通过应用Lasso方法的版本来估计回归系数、断裂数量及其位置。我们证明，当概率接近1时，该方法可以正确地确定中断的次数和中断的日期，并且特定状态回归系数的估计量是一致的和渐近正态的。我们还提供了蒙特卡洛结果，表明该方法在小样本中表现良好，经验结果表明，虽然控制系数正在打破，但犯罪模型中主要威慑回归因子的系数却没有。

引用次数: 4

Combining p-values for Multivariate Predictive Ability Testing 组合p值进行多元预测能力测试

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-04-19 DOI: 10.1080/07350015.2022.2067545

Lars Spreng, G. Urga

Abstract In this article, we propose an intersection-union test for multivariate forecast accuracy based on the combination of a sequence of univariate tests. The testing framework evaluates a global null hypothesis of equal predictive ability using any number of univariate forecast accuracy tests under arbitrary dependence structures, without specifying the underlying multivariate distribution. An extensive Monte Carlo simulation exercise shows that our proposed test has very good size and power properties under several relevant scenarios, and performs well in both low- and high-dimensional settings. We illustrate the empirical validity of our testing procedure using a large dataset of 84 daily exchange rates running from January 1, 2011 to April 1, 2021. We show that our proposed test addresses inconclusive results that often arise in practice.

摘要在本文中，我们提出了一种基于单变量检验序列组合的多变量预测精度的交并检验。该测试框架在任意依赖结构下使用任意数量的单变量预测准确性测试来评估具有相等预测能力的全局零假设，而不指定潜在的多变量分布。广泛的蒙特卡洛模拟实践表明，我们提出的测试在几个相关场景下具有非常好的尺寸和功率特性，并且在低维和高维环境中都表现良好。我们使用2011年1月1日至2021年4月1日的84个每日汇率的大型数据集来说明我们的测试程序的实证有效性。我们表明，我们提出的测试解决了实践中经常出现的不确定结果。

引用次数: 0

Structural Breaks in Grouped Heterogeneity 群体异质性中的结构断裂

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-04-08 DOI: 10.1080/07350015.2022.2063132

Simon C. Smith

Abstract Generating accurate forecasts in the presence of structural breaks requires careful management of bias-variance tradeoffs. Forecasting panel data under breaks offers the possibility to reduce parameter estimation error without inducing any bias if there exists a regime-specific pattern of grouped heterogeneity. To this end, we develop a new Bayesian methodology to estimate and formally test panel regression models in the presence of multiple breaks and unobserved regime-specific grouped heterogeneity. In an empirical application to forecasting inflation rates across 20 U.S. industries, our method generates significantly more accurate forecasts relative to a range of popular methods.

摘要在存在结构断裂的情况下生成准确的预测需要仔细管理偏差-方差权衡。如果存在特定于制度的分组异质性模式，则预测间断下的面板数据提供了在不引起任何偏差的情况下减少参数估计误差的可能性。为此，我们开发了一种新的贝叶斯方法，在存在多重中断和未观察到的特定于制度的分组异质性的情况下，估计并正式测试面板回归模型。在预测20年通货膨胀率的实证应用中与一系列流行的方法相比，我们的方法在美国工业中产生了更准确的预测。

引用次数: 4

Rejoinder: “Co-citation and Co-authorship Networks of Statisticians” 复辩状：“统计学家的共同引用和合作网络”

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2055358

Pengsheng Ji, Jiashun Jin, Z. Ke, Wanshan Li

We thank David Donoho for very encouraging comments. As always, his penetrating vision and deep thoughts are extremely stimulating. We are glad that he summarizes a major philosophical difference between statistics in earlier years (e.g., the time of Francis Galton) and statistics in our time by just a few words: data-first versus model-first. We completely agree with his comment that “each effort by a statistics researcher to understand a newly available type of data enlarges our field; it should be a primary part of the career of statisticians to cultivate an interest in cultivating new types of datasets, so that new methodology can be discovered and developed”; these are exactly the motivations underlying our (several-year) efforts in collecting, cleaning, and analyzing a large-scale high-quality dataset. We would like to add that both traditions have strengths, and combining the strengths of two sides may greatly help statisticians deal with the so-called crisis of the 21st century in statistics we face today. Let us explain the crisis above first. In the model-first tradition, with a particular application problem in mind, we propose a model, develop a method and justify its optimality by some hard-to-prove theorems, and find a dataset to support the approach. In this tradition, we put a lot of faith on our model and our theory: we hope the model is adequate, and we hope our optimality theory warrants the superiority of our method over others. Modern machine learning literature (especially the recent development of deep learning) provides a different approach to justifying the “superiority” of an approach; we compare the proposed approach with existing approaches by the real data results over a dozen of benchmark datasets. To choose an algorithm for their dataset, a practitioner does not necessarily need warranties from a theorem; a superior performance over many benchmark datasets says it all. To some theoretical statisticians, this is rather disappointing, as they come from a long

我们感谢大卫·多诺霍非常鼓舞人心的评论。一如既往，他锐利的眼光和深邃的思想极具启发性。我们很高兴他总结了早期统计(例如，弗朗西斯·高尔顿的时代)和我们这个时代的统计之间的主要哲学差异，只有几个字:数据优先与模型优先。我们完全同意他的评论:“统计研究人员为理解一种新的可用数据类型所做的每一次努力都扩大了我们的研究领域;培养培养新型数据集的兴趣应该成为统计学家职业生涯的一个主要部分，这样才能发现和发展新的方法”;这些正是我们(数年)努力收集、清理和分析大规模高质量数据集的动机。我们想补充的是，这两种传统都有各自的优势，将双方的优势结合起来，可能会极大地帮助统计学家应对我们今天面临的所谓21世纪统计危机。让我们先解释一下上述危机。在模型优先的传统中，考虑到特定的应用问题，我们提出了一个模型，开发了一种方法，并通过一些难以证明的定理来证明其最优性，并找到一个数据集来支持该方法。在这个传统中，我们对我们的模型和理论有很大的信心:我们希望模型是足够的，我们希望我们的最优性理论保证我们的方法优于其他方法。现代机器学习文献(尤其是深度学习的最新发展)提供了一种不同的方法来证明一种方法的“优越性”;我们通过十几个基准数据集的真实数据结果将所提出的方法与现有方法进行了比较。为了为他们的数据集选择一种算法，从业者不一定需要定理的保证;优于许多基准数据集的优越性能说明了一切。对于一些理论统计学家来说，这是相当令人失望的，因为他们来自一个漫长的

{"title":"Rejoinder: “Co-citation and Co-authorship Networks of Statisticians”","authors":"Pengsheng Ji, Jiashun Jin, Z. Ke, Wanshan Li","doi":"10.1080/07350015.2022.2055358","DOIUrl":"https://doi.org/10.1080/07350015.2022.2055358","url":null,"abstract":"We thank David Donoho for very encouraging comments. As always, his penetrating vision and deep thoughts are extremely stimulating. We are glad that he summarizes a major philosophical difference between statistics in earlier years (e.g., the time of Francis Galton) and statistics in our time by just a few words: data-first versus model-first. We completely agree with his comment that “each effort by a statistics researcher to understand a newly available type of data enlarges our field; it should be a primary part of the career of statisticians to cultivate an interest in cultivating new types of datasets, so that new methodology can be discovered and developed”; these are exactly the motivations underlying our (several-year) efforts in collecting, cleaning, and analyzing a large-scale high-quality dataset. We would like to add that both traditions have strengths, and combining the strengths of two sides may greatly help statisticians deal with the so-called crisis of the 21st century in statistics we face today. Let us explain the crisis above first. In the model-first tradition, with a particular application problem in mind, we propose a model, develop a method and justify its optimality by some hard-to-prove theorems, and find a dataset to support the approach. In this tradition, we put a lot of faith on our model and our theory: we hope the model is adequate, and we hope our optimality theory warrants the superiority of our method over others. Modern machine learning literature (especially the recent development of deep learning) provides a different approach to justifying the “superiority” of an approach; we compare the proposed approach with existing approaches by the real data results over a dozen of benchmark datasets. To choose an algorithm for their dataset, a practitioner does not necessarily need warranties from a theorem; a superior performance over many benchmark datasets says it all. To some theoretical statisticians, this is rather disappointing, as they come from a long","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"40 1","pages":"499 - 504"},"PeriodicalIF":3.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41393404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Discussion of “Cocitation and Coauthorship Networks of Statisticians” 关于“统计学家合著网络”的探讨

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2037432

Haolei Weng, Yang Feng

Abstract We congratulate the authors for their stimulating and thought-provoking work on network data analysis. In the article, the authors not only introduce a new large-scale and high-quality publication dataset that will surely become an important benchmark for further network research, but also present novel statistical methods and modeling which lead to very interesting findings about the statistics community. There is much material for thought and exploration. In this discussion, we will focus on the cocitation networks, and discuss a few points for the coauthorship networks toward the end.

我们祝贺作者在网络数据分析方面所做的启发性和发人深省的工作。在文章中，作者不仅介绍了一个新的大规模和高质量的出版物数据集，这必将成为进一步网络研究的重要基准，而且还提出了新的统计方法和建模，这导致了统计界非常有趣的发现。有很多材料值得思考和探索。在这次讨论中，我们将集中讨论共同作者网络，并在最后讨论共同作者网络的几个要点。

引用次数: 1

Discussion of “Co-citation and Co-authorship Networks of Statisticians” by Pengsheng Ji, Jiashun Jin, Zheng Tracy Ke, and Wanshan Li 纪鹏生、金家顺、柯郑翠、李万山对“统计学家共引合著网络”的探讨

IF 3 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics

Pub Date : 2022-04-03 DOI: 10.1080/07350015.2022.2041423

Peter Macdonald, E. Levina, Ji Zhu

We congratulate the authors on an interesting paper and on making an important contribution to the network analysis community through compiling a large new dataset which will spur further work on multilayer, dynamic and other complex network settings. This discussion focuses on the paper’s particular methods and applications in dynamic network analysis. Complexity of dynamic network data leads to many necessary analyst choices in both data processing and network modeling. Where possible, we will compare the choices made in this paper with other possibilities from recent literature on dynamic network analysis. One of the important points of the paper is that much of our network data has always been dynamic. For instance, communication networks consisting of sent and received E-mails come with time stamps, whether we choose to incorporate them or not. Developing statistical methods that take advantage of this time varying structure will lead to greater efficiency, novel insights, and generally allow us to take full advantage of rich modern datasets like the one featured in this paper.

我们祝贺作者发表了一篇有趣的论文，并对网络分析社区做出了重要贡献，他们编纂了一个大型的新数据集，这将促进对多层、动态和其他复杂网络设置的进一步研究。本文着重讨论了本文在动态网络分析中的具体方法和应用。动态网络数据的复杂性导致分析人员在数据处理和网络建模方面有许多必要的选择。在可能的情况下，我们将把本文所做的选择与近期动态网络分析文献中的其他可能性进行比较。本文的重点之一是我们的网络数据一直是动态的。例如，由发送和接收电子邮件组成的通信网络带有时间戳，无论我们是否选择合并它们。开发利用这种时变结构的统计方法将带来更高的效率，新的见解，并且通常允许我们充分利用丰富的现代数据集，如本文所述的数据集。

{"title":"Discussion of “Co-citation and Co-authorship Networks of Statisticians” by Pengsheng Ji, Jiashun Jin, Zheng Tracy Ke, and Wanshan Li","authors":"Peter Macdonald, E. Levina, Ji Zhu","doi":"10.1080/07350015.2022.2041423","DOIUrl":"https://doi.org/10.1080/07350015.2022.2041423","url":null,"abstract":"We congratulate the authors on an interesting paper and on making an important contribution to the network analysis community through compiling a large new dataset which will spur further work on multilayer, dynamic and other complex network settings. This discussion focuses on the paper’s particular methods and applications in dynamic network analysis. Complexity of dynamic network data leads to many necessary analyst choices in both data processing and network modeling. Where possible, we will compare the choices made in this paper with other possibilities from recent literature on dynamic network analysis. One of the important points of the paper is that much of our network data has always been dynamic. For instance, communication networks consisting of sent and received E-mails come with time stamps, whether we choose to incorporate them or not. Developing statistical methods that take advantage of this time varying structure will lead to greater efficiency, novel insights, and generally allow us to take full advantage of rich modern datasets like the one featured in this paper.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"40 1","pages":"492 - 493"},"PeriodicalIF":3.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41739556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Business & Economic Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀