On the Wisdom of Crowds
(of Economists)

Francis X. Diebold
University of Pennsylvania
and NBER
   Aarón Mora
University of South Carolina
   Minchul Shin
Federal Reserve Bank of Philadelphia

Abstract: We study the properties of macroeconomic survey forecast response averages as the number of survey respondents grows. Such averages are “portfolios” of forecasts. We characterize the speed and pattern of the gains from diversification and their eventual decrease with portfolio size (the number of survey respondents) in both (1) the key real-world data-based environment of the U.S. Survey of Professional Forecasters (SPF), and (2) the theoretical model-based environment of equicorrelated forecast errors. We proceed by proposing and comparing various direct and model-based “crowd size signature plots”, which summarize the forecasting performance of k𝑘kitalic_k-average forecasts as a function of k𝑘kitalic_k, where k𝑘kitalic_k is the number of forecasts in the average. We then estimate the equicorrelation model for growth and inflation forecast errors by choosing model parameters to minimize the divergence between direct and model-based signature plots. The results indicate near-perfect equicorrelation model fit for both growth and inflation, which we explicate by showing analytically that, under conditions, the direct and fitted equicorrelation model-based signature plots are identical at a particular model parameter configuration. We find that the gains from diversification are greater for inflation forecasts than for growth forecasts, but that both gains nevertheless decrease quite quickly, so that fewer SPF respondents than currently used may be adequate.


Acknowledgments: For helpful comments and discussions we thank seminar and conference participants at the University of Washington and Cornell University, the International Conference on Macroeconomic Analysis and International Finance (Rethimno, Crete), and the Conference on Real-Time Data Analysis, Methods and Applications in Macroeconomics and Finance (Bank of Canada, Ottawa). For research assistance we thank Jacob Broussard. Any remaining errors are ours alone.


Key words: Macroeconomic surveys of professional forecasters, forecast combination, model averaging, equicorrelation


JEL codes: C5, C8, E3, E6


1 Introduction and Basic Framework

The wisdom of crowds, or lack thereof, is traditionally and presently a central issue in psychology, history, and political science; see for example Surowiecki (2005) regarding wisdom, and Aliber et al. (2023) regarding lack thereof. Perhaps most prominently, however, the wisdom of crowds is also—and again, traditionally and presently—a central issue in economics and finance, where heterogeneous information and expectations formation take center stage.111Interestingly, moreover, it also features prominently in new disciplines like machine learning and artificial intelligence, via forecast combination methods like ensemble averaging (e.g., Diebold et al., 2023).

In this paper we focus on economics and finance, studying the “wisdom” of “crowds” of professional economists. We focus on the U.S. Survey of Professional Forecasters (SPF), which is important not only in facilitating empirical academic research in macroeconomics and financial economics, but also—and crucially—in guiding real-time policy, business, and investment management decisions.222On real-time policy and its evaluation, see John Tayor’s inaugural NBER Feldstein Lecture at https://www.hoover.org/sites/default/files/gmwg-empirically-evaluating-economic-policy-in-real-time.pdf.,333For an introduction to the SPF, see the materials at https://www.philadelphiafed.org/surveys-and-data/real-time-data-research/survey-of-professional-forecasters.

In particular, we study SPF crowd behavior as crowd size grows, asking precisely the same sorts of questions of SPF “forecast portfolios” that one asks of financial asset portfolios: How quickly, and with what patterns, do diversification benefits become operative, and eventually dissipate, as portfolio size (the number of forecasters) grows, and why? Do the results differ across variables (e.g., growth vs. inflation), and if so, why? What are the implications for survey size and design? We are of course not the first to ask such questions. Classic early work on which we build includes, for example, Makridakis and Winkler (1983) and Batchelor and Dua (1995). We progress much farther, however, particularly as regards analytic characterization.

We answer the above questions using what we call “crowd size signature plots”, which summarize the forecasting performance of k𝑘kitalic_k-average forecasts as a function of k𝑘kitalic_k, where k𝑘kitalic_k is the number of forecasts in the average. We examine not only direct signature plots (empirically), but also model-based signature plots (analytically, based on a simple model of forecast-error equicorrelation), after which we proceed to estimate and assess the equicorrelation model.

To fix ideas and notation, let us sketch the basic framework with some precision. Let N𝑁Nitalic_N refer to a set of forecasts with N×1𝑁1N\times 1italic_N × 1 zero-mean time-t𝑡titalic_t error vector etsubscript𝑒𝑡e_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T, and let kN𝑘𝑁k\leq Nitalic_k ≤ italic_N refer to a subset of forecasts. We consider k𝑘kitalic_k-forecast averages, and we seek to characterize k𝑘kitalic_k-forecast mean-squared forecast error (MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E). For a particular k𝑘kitalic_k-forecast average corresponding to group gksubscriptsuperscript𝑔𝑘g^{*}_{k}italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the forecast error is just the average of the individual forecast errors, so we have

MSE^T(k)=1Tt=1T(1kigkeit)2.subscriptsuperscript^𝑀𝑆𝐸𝑇𝑘1𝑇superscriptsubscript𝑡1𝑇superscript1𝑘subscript𝑖subscriptsuperscript𝑔𝑘subscript𝑒𝑖𝑡2\widehat{MSE}^{*}_{T}(k)={\frac{1}{T}\sum_{t=1}^{T}\left(\frac{1}{k}\sum_{i\in g% ^{*}_{k}}e_{it}\right)^{2}}.over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_k ) = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (1)

For any choice of k𝑘kitalic_k, however, there are (Nk)binomial𝑁𝑘N\choose k( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) possible k𝑘kitalic_k-forecast averages. We focus on the k𝑘kitalic_k-average MSE^T(k)subscriptsuperscript^𝑀𝑆𝐸𝑇𝑘\widehat{MSE}^{*}_{T}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_k ) given by equation (1), averaged across all groups of size k𝑘kitalic_k,

MSE^NTavg(k)=1(Nk)gk=1(Nk)(1Tt=1T(1kigkeit)2),subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1𝑇superscriptsubscript𝑡1𝑇superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2\widehat{MSE}^{avg}_{NT}(k)=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}% }\left(\frac{1}{T}\sum_{t=1}^{T}\left(\frac{1}{k}\sum_{i\in g_{k}}e_{it}\right% )^{2}\right),over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) = divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (2)

where gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is an arbitrary member of the set of groups of size k𝑘kitalic_k.

Among other things, we are interested in:

  1. (a)

    Tracking and visualizing MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) as k𝑘kitalic_k grows (“MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots”);

  2. (b)

    Tracking and visualizing the change (improvement) in MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) from adding one more forecast to the pool (i.e., moving from k𝑘kitalic_k to k+1𝑘1k+1italic_k + 1 forecasts),

    DMSE^NTavg(k)=MSE^NTavg(k)MSE^NTavg(k+1),subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘1\widehat{DMSE}^{avg}_{NT}(k)=\widehat{MSE}^{avg}_{NT}(k)-\widehat{MSE}^{avg}_{% NT}(k+1),over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) = over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) - over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k + 1 ) , (3)

    as k𝑘kitalic_k grows (“DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots”);

  3. (c)

    Tracking and visualizing the average performance from k𝑘kitalic_k-averaging (MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k )) relative to the performance from no averaging,

    R^NTavg(k)=MSE^NTavg(k)MSE^NTavg(1)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇1\widehat{R}^{avg}_{NT}(k)=\frac{\widehat{MSE}^{avg}_{NT}(k)}{\widehat{MSE}^{% avg}_{NT}(1)}over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) = divide start_ARG over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) end_ARG start_ARG over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 1 ) end_ARG (4)

    as k𝑘kitalic_k grows (“R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots”, where we use “R𝑅Ritalic_R” to denote “ratio”);

  4. (d)

    Tracking and visualizing not just mean squared-error performance as k𝑘kitalic_k grows, as in all of the above signature plots, but rather the complete distributional squared-error performance as k𝑘kitalic_k grows (“f^NTavg(k)subscriptsuperscript^𝑓𝑎𝑣𝑔𝑁𝑇𝑘\widehat{f}^{avg}_{NT}(k)over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots”, where f^NTavg(k)subscriptsuperscript^𝑓𝑎𝑣𝑔𝑁𝑇𝑘\widehat{f}^{avg}_{NT}(k)over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) is the average empirical distribution of eit2superscriptsubscript𝑒𝑖𝑡2e_{it}^{2}italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for crowd size k𝑘kitalic_k);

  5. (e)

    Understanding paths and patterns in the above signature plots as k𝑘kitalic_k grows, whether obtained by direct analysis of the SPF data, or by analysis of (equicorrelation) models fit to the SPF data;

  6. (f)

    Assessing the equicorrelation model by comparing direct and model-based SPF signature plot estimates;

  7. (g)

    Understanding similarities and differences in results across variables (growth vs. inflation);

  8. (h)

    Drawing practical implications for SPF design.

We proceed as follows. In section 2 we study the SPF, and we estimate its crowd size signature plots directly. In section 3 we study an equicorrelation model, and we characterize its crowd size signature plots analytically for any parameter configuration. In section 4 we estimate and assess the equicorrelation model by choosing its parameters to minimize divergence between direct and model-based signature plots. In section 5 we conclude and sketch several directions for future research.

2 Direct Crowd Size Signature Plots

Figure 1: SPF Participation
Refer to caption
{adjustwidth}

1cm1cm Notes: We show the number of participants in the U.S. Survey of Professional Forecasters, 1968Q4-2023Q2.

Figure 2: SPF Forecast Error Means and Standard Deviations

Growth Mean           Growth S.D.

Refer to caption
Refer to caption

Inflation Mean          Inflation S.D.

Refer to caption
Refer to caption
{adjustwidth}

1cm1cm Notes: We show time series of cross-sectional means and standard deviations of individual 1-step-ahead forecast errors, 1968Q4-2023Q2. Gray shaded regions denote recessions.

The U.S. Survey of Professional Forecasters is a quarterly survey covering several U.S. macroeconomic variables. It was started in 1968Q4 and is currently conducted and maintained by the Federal Reserve Bank of Philadelphia.444For a recent introduction see Croushore and Stark (2019). In Figure 1 we show the evolution of the number of forecast participants, which declined until 1990Q2, when the Federal Reserve Bank of Philadelphia took control of the survey, after which it has had approximately 40 participants. Participants stayed for 15 quarters on average, with a minimum of 1 quarter and a maximum of 125 quarters.

We will analyze SPF point forecasts for real output growth (“growth”) and GDP deflator inflation (“inflation”), for forecast horizons h=1,2,3,and4123and4h=1,2,3,{\rm and~{}}4italic_h = 1 , 2 , 3 , roman_and 4, corresponding to short-, medium-, and longer-term forecasts.555The SPF contains quarterly level forecasts of real GDP and the GDP implicit price deflator. We transform the level forecasts into growth and inflation forecasts by computing annualized quarter-on-quarter growth rates, and we compute the corresponding forecast errors using realized values as of December 2023. See Appendix A for details. Our sample period is 1968Q4-2023Q2, during which the survey panel had 38 participants per survey on average.

2.1 SPF Forecast Errors

Because individual forecast errors drive our analysis, as per equation (2), we begin by examining their evolving period-by-period cross-sectional distributions. In Figure 2 we show the time-series of cross-sectional means and standard deviations. Several features are apparent:

  1. (a)

    The Great Moderation is clearly reflected in both the growth and inflation error distributions, which have noticeably reduced variability from the end of the Volcker Recession to the start of the Great Recession.

  2. (b)

    Growth tends to be over-predicted when entering recessions; that is, the mean growth error (actual minus predicted) distribution tends to be negative. Hence recessions tend to catch forecasters by surprise. The Pandemic Recession provides the most extreme example, as the mean growth error plunges.

  3. (c)

    Growth is, however, sometimes systematically under-predicted during recoveries. The Pandemic Recession again provides the most extreme example, as the mean growth error leaps skyward.

  4. (d)

    Inflation shows little such systematic over- or under-prediction when entering or exiting recessions, except for the entry into the Oil Shock Recession, when inflation was noticeably under-predicted.

  5. (e)

    The variability of the growth error distribution increases during recessions, most notably during the Great Recession and the Pandemic Recession, reflecting greater disagreement among forecasters. The same is true of the inflation error distribution during those two recessions.

  6. (f)

    The unusual behavior of the inflation error distribution following the Pandemic Recession is clearly revealed. The mean error is always positive there (i.e., forecasters tended to under-predict), with the amount of under-prediction first growing and then shrinking. The inflation error variability follows a similar path.

2.2 Directly-Estimated Crowd Size Signature Plots

Figure 3: Direct MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) Crowd Size Signature Plots

Growth                                                  Inflation Refer to caption Refer to caption

Notes: We show MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots for SPF growth and inflation forecasts at horizons h=1,2,3,41234h=1,2,3,4italic_h = 1 , 2 , 3 , 4, for group sizes k=1,2,,20𝑘1220k=1,2,...,20italic_k = 1 , 2 , … , 20. For each k𝑘kitalic_k, we produce the figure by randomly drawing B=30,000𝐵30000B=30,000italic_B = 30 , 000 groups of size k𝑘kitalic_k for each t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T.

Figure 4: Direct DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) Crowd Size Signature Plots

Growth                                                  Inflation Refer to caption Refer to caption

Notes: We show DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots for SPF growth and inflation forecasts at horizons h=1,2,3,41234h=1,2,3,4italic_h = 1 , 2 , 3 , 4, for group sizes k=1,2,,20𝑘1220k=1,2,...,20italic_k = 1 , 2 , … , 20. For each k𝑘kitalic_k, we produce the figure by randomly drawing B=30,000𝐵30000B=30,000italic_B = 30 , 000 groups of size k𝑘kitalic_k for each t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T.

In principle we want to compute MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ), but that is impossible in practice unless N𝑁Nitalic_N is very small, due to the potentially huge number of different k𝑘kitalic_k-average forecasts.666For example, for N=40𝑁40N=40italic_N = 40, which is a realistic value for surveys of forecasters, and k=20𝑘20k=20italic_k = 20, we obtain (Nk)=1.4×1011binomial𝑁𝑘1.4superscript1011{N\choose k}=1.4{\times}10^{11}( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) = 1.4 × 10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT. Hence we proceed by approximating MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) as follows:

  1. (a)

    Randomly select a k𝑘kitalic_k-average forecast gksubscriptsuperscript𝑔𝑘g^{*}_{k}italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and calculate MSE^T(k)subscriptsuperscript^𝑀𝑆𝐸𝑇𝑘\widehat{MSE}^{*}_{T}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_k ).

  2. (b)

    Repeat B𝐵Bitalic_B times, and average the MSE^T(k)subscriptsuperscript^𝑀𝑆𝐸𝑇𝑘\widehat{MSE}^{*}_{T}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_k ) values across the B𝐵Bitalic_B draws, where B𝐵Bitalic_B is large, but not so large as to be computationally intractable.777In this paper we use B=30,000𝐵30000B=30,000italic_B = 30 , 000.

We show direct crowd size signature plots for growth and inflation, for horizons h=1,2,3,41234h=1,2,3,4italic_h = 1 , 2 , 3 , 4, in Figure 3 (MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) plots) and Figure 4 (DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) plots). Several features are apparent:

  1. (a)

    For both growth and inflation, the MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plot is lowest for h=11h=1italic_h = 1, with the signature plots for hhitalic_h = 2, 3, and 4 progressively farther above the h=11h=1italic_h = 1 plot in roughly parallel upward shifts, reflecting the fact that the near future is generally easier to predict than the more-distant future.

  2. (b)

    The growth MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plot does not rise much from h=33h=3italic_h = 3 to h=44h=4italic_h = 4, in contrast to the inflation MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plot, suggesting that growth predictability drops with horizon more quickly than inflation predictability, effectively vanishing by h=33h=3italic_h = 3.

  3. (c)

    For both growth and inflation and all forecast horizons, the reduction in MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) from k=1𝑘1k=1italic_k = 1 to k=5𝑘5k=5italic_k = 5 dwarfs the improvement from moving from k=6𝑘6k=6italic_k = 6 to k=20𝑘20k=20italic_k = 20, as visually emphasized by the DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plots in Figure 4. Hence there is little benefit from adding representative forecasters to the pool beyond k=5𝑘5k=5italic_k = 5.

  4. (d)

    For both growth and inflation and all forecast horizons, the DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plots are approximately the same (i.e., no upward shifts), which is expected because the MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plots shift with horizon in approximately parallel fashion, leaving the “first derivative” (DMSE^NTavg(k)subscriptsuperscript^𝐷𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{DMSE}^{avg}_{NT}(k)over^ start_ARG italic_D italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k )) unchanged.

Figure 5: Direct R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) Crowd Size Signature Plots

Growth                                                  Inflation Refer to caption Refer to caption

Notes: We show direct R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots for SPF growth and inflation forecasts at horizon h=1,2,3,41234h=1,2,3,4italic_h = 1 , 2 , 3 , 4, for group sizes k=1,2,,20𝑘1220k=1,2,...,20italic_k = 1 , 2 , … , 20. For each k𝑘kitalic_k, we produce the figure by randomly drawing B=30,000𝐵30000B=30,000italic_B = 30 , 000 groups of size k𝑘kitalic_k for each t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T.

We show direct growth and inflation R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots in Figure 5. They are simply the MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) plots of Figure 3, scaled by MSE^NTavg(1)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇1\widehat{MSE}^{avg}_{NT}(1)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 1 ) (the benchmark MSE corresponding to no averaging), so that R^NTavg(1)1subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇11\widehat{R}^{avg}_{NT}(1)\equiv 1over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 1 ) ≡ 1. The R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) plots facilitate MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) comparisons across growth and inflation, particularly when a common vertical scale is used, as in Figure 5. It is immediately apparent that the growth vs inflation R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) plots asymptote to very different levels as k𝑘kitalic_k increases – approximately 80% for growth and 60% for inflation – which highlights an important result not discussed thus far: The benefits of SPF “portfolio diversification” appear substantially greater for inflation than for growth, presumably due to lower correlation among the inflation forecasts. We will return to this issue when we study and estimate models of equicorrelated forecast errors in sections 3 and 4 below.

Figure 6: Direct f^NTavg(k)subscriptsuperscript^𝑓𝑎𝑣𝑔𝑁𝑇𝑘\widehat{f}^{avg}_{NT}(k)over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) Crowd Size Signature Plots

Growth                                                  Inflation

Refer to caption
Refer to caption

Notes: We show f^NTavg(k)subscriptsuperscript^𝑓𝑎𝑣𝑔𝑁𝑇𝑘\widehat{f}^{avg}_{NT}(k)over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots for SPF growth and inflation forecasts at horizon h=11h=1italic_h = 1, for group sizes k=1,2,,20𝑘1220k=1,2,...,20italic_k = 1 , 2 , … , 20. For each k𝑘kitalic_k, we produce the figure by randomly drawing B=30,000𝐵30000B=30,000italic_B = 30 , 000 groups of size k𝑘kitalic_k for each t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T.

Finally, we show growth and inflation f^NTavg(k)subscriptsuperscript^𝑓𝑎𝑣𝑔𝑁𝑇𝑘\widehat{f}^{avg}_{NT}(k)over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) crowd size signature plots in Figure 6. In particular, we show boxplots of squared 1-step-ahead k𝑘kitalic_k-average forecast errors for k=1,,20𝑘120k=1,...,20italic_k = 1 , … , 20.888The boxplots display the median, the first and third quartiles, the lower extreme value (first quartile minus 1.5 times interquartile range), the upper extreme value (third quartile plus 1.5 times the interquartile range), and outliers. Both the growth and inflation forecast error distributions are highly right-skewed for small k, but they become less variable and more symmetric as k grows and the central limit theorem (CLT) becomes operative, which happens noticeably less quickly for growth than for inflation. It is interesting to note, moreover, that for both growth and inflation the worst-case (maximum) MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E is dropping in k𝑘kitalic_k (“bad luck” resulting in high MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E happens easily for small k𝑘kitalic_k but is reduced as k𝑘kitalic_k increases and the CLT becomes operative), but best-case (minimum) MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E is increasing in k𝑘kitalic_k (“good luck” resulting in low MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E happens easily for small k𝑘kitalic_k but is reduced as k𝑘kitalic_k increases and the CLT becomes operative).

3 Model-Based Crowd Size Signature Plots

Having empirically characterized crowd size signature plots directly in the SPF data, we now proceed to characterize them analytically in a simple covariance-stationary equicorrelation model, in which et(0,Σ)similar-tosubscript𝑒𝑡0Σe_{t}\sim(0,\Sigma)italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ ( 0 , roman_Σ ), where 00 is the N×1𝑁1N\times 1italic_N × 1 zero vector and ΣΣ\Sigmaroman_Σ is an N×N𝑁𝑁N\times Nitalic_N × italic_N forecast-error covariance matrix displaying equicorrelation, by which we mean that all variances are identical and all implied correlations are identical.

3.1 Equicorrelated Forecast Errors

A trivial equicorrelation example occurs when Σ=σ2IΣsuperscript𝜎2𝐼\Sigma=\sigma^{2}Iroman_Σ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I, where I𝐼Iitalic_I denotes the N×N𝑁𝑁N\times Nitalic_N × italic_N identity matrix, so that all variances are equal (σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT), and all correlations are equal (0). Of course the zero-correlation case is unrealistic, because, for example, economic forecast errors are invariably positively correlated due to overlap of information sets, but it will serve as a useful benchmark, so we begin with it.

Simple averaging is the fully optimal forecast combination in the zero-correlation environment, which is obvious since the forecasts are exchangeable. More formally, the optimality of simple averaging (equal combining weights) follows from the multivariate Bates and Granger (1969) formula for MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E-optimal combining weights,

λ=(ιΣ1ι)1Σ1ι,superscript𝜆superscriptsuperscript𝜄superscriptΣ1𝜄1superscriptΣ1𝜄\lambda^{*}=\left(\iota^{\prime}\Sigma^{-1}\iota\right)^{-1}\Sigma^{-1}\iota,italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_ι start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι , (5)

where ι𝜄\iotaitalic_ι is a k𝑘kitalic_k-dimensional column vector of ones. For Σ=σ2IΣsuperscript𝜎2𝐼\Sigma=\sigma^{2}Iroman_Σ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I the optimal weights collapse to

λ=(σ2N)1σ2ι=1Nι.superscript𝜆superscriptsuperscript𝜎2𝑁1superscript𝜎2𝜄1𝑁𝜄\lambda^{*}=(\sigma^{-2}N)^{-1}\sigma^{-2}\iota=\frac{1}{N}\iota.italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_N ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_ι = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG italic_ι .

Analytical results for MSENTavg(k;σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜎MSE^{avg}_{NT}(k;\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_σ ) are straightforward for simple averages in the zero-correlation environment. Immediately, for T𝑇Titalic_T sufficiently large, we have

MSENTavg(k;σ)=plimT(1(Nk)gk=1(Nk)(1Tt=1T(1kigkeit)2))𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜎𝑝𝑙𝑖subscript𝑚𝑇1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1𝑇superscriptsubscript𝑡1𝑇superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2MSE^{avg}_{NT}(k;\sigma)=plim_{T\rightarrow\infty}\left(\frac{1}{{N\choose k}}% \sum_{g_{k}=1}^{{N\choose k}}\left(\frac{1}{T}\sum_{t=1}^{T}\left(\frac{1}{k}% \sum_{i\in g_{k}}e_{it}\right)^{2}\right)\right)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_σ ) = italic_p italic_l italic_i italic_m start_POSTSUBSCRIPT italic_T → ∞ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) (6)
=1(Nk)gk=1(Nk)(E[(1kigkeit)2])absent1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘𝐸delimited-[]superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\left(E\left[\left(\frac{1% }{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right]\right)= divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ( italic_E [ ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] )
=1(Nk)gk=1(Nk)1k2E[(igkeit)2]absent1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1superscript𝑘2𝐸delimited-[]superscriptsubscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}E\left[% \left(\sum_{i\in g_{k}}e_{it}\right)^{2}\right]= divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_E [ ( ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=1(Nk)gk=1(Nk)1k2igkE[eit2]absent1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1superscript𝑘2subscript𝑖subscript𝑔𝑘𝐸delimited-[]superscriptsubscript𝑒𝑖𝑡2=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}\sum_{i\in g% _{k}}E\left[e_{it}^{2}\right]= divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_E [ italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=σ2k.absentsuperscript𝜎2𝑘=\frac{\sigma^{2}}{k}.= divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k end_ARG .

Moreover,

DMSENTavg(k;σ)=σ2k(k+1)𝐷𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜎superscript𝜎2𝑘𝑘1DMSE^{avg}_{NT}(k;\sigma)=\frac{\sigma^{2}}{k(k+1)}italic_D italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_σ ) = divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ( italic_k + 1 ) end_ARG (7)

and

RNTavg(k)=1k.subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘1𝑘R^{avg}_{NT}(k)=\frac{1}{k}.italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG . (8)

(Notice that σ𝜎\sigmaitalic_σ cancels in the RNTavg(k;σ)subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘𝜎R^{avg}_{NT}(k;\sigma)italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_σ ) calculation, so we simply write RNTavg(k)subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘R^{avg}_{NT}(k)italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ).)

We now move to a richer equicorrelation case with equal but nonzero correlations, but still with equal variances (we refer to it as “strong equicorrelation”, or simply “equicorrelation” when the meaning is clear from context), so that instead of Σ=σ2IΣsuperscript𝜎2𝐼\Sigma=\sigma^{2}Iroman_Σ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I we have

Σ=σ2R,Σsuperscript𝜎2𝑅\Sigma=\sigma^{2}R,roman_Σ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R , (9)

where

R=(1ρρρ1ρρρ1),𝑅matrix1𝜌𝜌𝜌1𝜌𝜌𝜌1R=\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\\ \end{pmatrix},italic_R = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) , (10)

and ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [.999R𝑅Ritalic_R is positive definite if and only if ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [. See Lemma 2.1 of Engle and Kelly (2012). Recent work, in particular Engle and Kelly (2012), has made use of equicorrelation in the context of modeling multivariate financial asset return volatility.

Importantly, the optimality of simple averaging under zero correlation is preserved under equicorrelation.101010That is, equicorrelation is sufficient for the optimality of simple averaging. Elliott (2011) shows that a necessary and sufficient condition for optimality of simple averaging is that row sums of ΣΣ\Sigmaroman_Σ be equal. Equicorrelation is one such case, although there are of course others, obtained by manipulating correlations in their relation to variances to keep row sums equal, but none are nearly so compelling and readily interpretable as equicorrelation. To see why, consider the inverse covariance matrix in the expression for the optimal combining weight vector, (5). In the equicorrelation case we have

Σ1=1σ2R1,superscriptΣ11superscript𝜎2superscript𝑅1\Sigma^{-1}=\frac{1}{\sigma^{2}}R^{-1},roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (11)

where111111See Lemma 2.1 of Engle and Kelly (2012).

R1=11ρIρ(1ρ)(1+(N1)ρ)ιι.superscript𝑅111𝜌𝐼𝜌1𝜌1𝑁1𝜌𝜄superscript𝜄R^{-1}=\frac{1}{1-\rho}I-\frac{\rho}{(1-\rho)(1+(N-1)\rho)}\iota\iota^{\prime}.italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 1 - italic_ρ end_ARG italic_I - divide start_ARG italic_ρ end_ARG start_ARG ( 1 - italic_ρ ) ( 1 + ( italic_N - 1 ) italic_ρ ) end_ARG italic_ι italic_ι start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . (12)

Then, using equation (12), the first part of the optimal combining weight (5) is

ιΣ1ι=Nσ2(1+(N1)ρ)ρN(1ρ)(1+(N1)ρ),superscript𝜄superscriptΣ1𝜄𝑁superscript𝜎21𝑁1𝜌𝜌𝑁1𝜌1𝑁1𝜌\iota^{\prime}\Sigma^{-1}\iota=\frac{N}{\sigma^{2}}\frac{(1+(N-1)\rho)-\rho N}% {(1-\rho)(1+(N-1)\rho)},italic_ι start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι = divide start_ARG italic_N end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ( 1 + ( italic_N - 1 ) italic_ρ ) - italic_ρ italic_N end_ARG start_ARG ( 1 - italic_ρ ) ( 1 + ( italic_N - 1 ) italic_ρ ) end_ARG , (13)

and the second part is

Σ1ι=1σ2(1+(N1)ρ)ρN(1ρ)(1+(N1)ρ)ι.superscriptΣ1𝜄1superscript𝜎21𝑁1𝜌𝜌𝑁1𝜌1𝑁1𝜌𝜄\Sigma^{-1}\iota=\frac{1}{\sigma^{2}}\frac{(1+(N-1)\rho)-\rho N}{(1-\rho)(1+(N% -1)\rho)}\iota.roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι = divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ( 1 + ( italic_N - 1 ) italic_ρ ) - italic_ρ italic_N end_ARG start_ARG ( 1 - italic_ρ ) ( 1 + ( italic_N - 1 ) italic_ρ ) end_ARG italic_ι . (14)

Inserting equations (13) and (14) into equation (5) yields

λ=1Nι,superscript𝜆1𝑁𝜄\lambda^{*}=\frac{1}{N}\iota,italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG italic_ι , (15)

establishing the optimality of equal weights.

Having now introduced equicorrelation and shown that it implies optimality of simple average forecast combinations, it is of interest to assess whether it is a potentially reasonable model for sets of survey forecast errors. The answer is yes. First, obviously but importantly, the information sets of economic forecasters are quite highly overlapping, so it is not an unreasonable approximation to suppose that various pairs of forecast errors will be positively and similarly correlated.

Second, less obviously but also importantly, equicorrelation is closely linked to factor structure, which is a great workhorse of modern macroeconomics and business-cycle analysis (e.g., Stock and Watson, 2016). In particular, equicorrelation arises when forecast errors have single-factor structure with equal factor loadings and equal idiosyncratic shock variances, as in:

eit=δzt+witsubscript𝑒𝑖𝑡𝛿subscript𝑧𝑡subscript𝑤𝑖𝑡e_{it}=\delta z_{t}+w_{it}italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT = italic_δ italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT (16)
zt=ϕzt1+vt,subscript𝑧𝑡italic-ϕsubscript𝑧𝑡1subscript𝑣𝑡z_{t}=\phi z_{t-1}+v_{t},italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ϕ italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where witiid(0,σw2)similar-tosubscript𝑤𝑖𝑡𝑖𝑖𝑑0superscriptsubscript𝜎𝑤2w_{it}\sim iid(0,\sigma_{w}^{2})italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), vtiid(0,σv2)similar-tosubscript𝑣𝑡𝑖𝑖𝑑0superscriptsubscript𝜎𝑣2v_{t}\sim iid(0,\sigma_{v}^{2})italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , italic_σ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and witvtperpendicular-tosubscript𝑤𝑖𝑡subscript𝑣𝑡w_{it}\perp v_{t}italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ⟂ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, i,tfor-all𝑖𝑡\forall i,t∀ italic_i , italic_t, i=1,,N𝑖1𝑁i=1,...,Nitalic_i = 1 , … , italic_N, t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T. In Appendix B we also explore a less-restrictive form of factor structure that produces a less-restrictive form of equicorrelation (“weak equicorrelation”).

Finally, a large literature from the 1980s onward documents the routine outstanding empirical performance of simple average forecast combinations, despite the fact that simple averages are not optimal in general (e.g., Clemen, 1989; Genre et al., 2013; Elliott and Timmermann, 2016; Diebold and Shin, 2019). As we have seen, however, equicorrelation is sufficient (and almost necessary) for optimality of simple averages, so that if simple averages routinely perform well, then the equicorrelation model is routinely reasonable – and the natural model to pair with the simple averages embodied in the SPF.

3.2 Analytic Equicorrelation Crowd Size Signature Plots

Analytical results for MSENTavg()𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇MSE^{avg}_{NT}(\cdot)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( ⋅ ), DMSENTavg()𝐷𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇DMSE^{avg}_{NT}(\cdot)italic_D italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( ⋅ ) and RNTavg()subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇R^{avg}_{NT}(\cdot)italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( ⋅ ) are easy to obtain under equicorrelation, just as they were under zero correlation. Immediately, for T𝑇Titalic_T sufficiently large,

MSENTavg(k;ρ,σ)=plimT(1(Nk)gk=1(Nk)(1Tt=1T(1kigkeit)2))𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎𝑝𝑙𝑖subscript𝑚𝑇1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1𝑇superscriptsubscript𝑡1𝑇superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2MSE^{avg}_{NT}(k;\rho,\sigma)=plim_{T\rightarrow\infty}\left(\frac{1}{{N% \choose k}}\sum_{g_{k}=1}^{{N\choose k}}\left(\frac{1}{T}\sum_{t=1}^{T}\left(% \frac{1}{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right)\right)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) = italic_p italic_l italic_i italic_m start_POSTSUBSCRIPT italic_T → ∞ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) (17)
=1(Nk)gk=1(Nk)(E[(1kigkeit)2])absent1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘𝐸delimited-[]superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\left(E\left[\left(\frac{1% }{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right]\right)= divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ( italic_E [ ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] )
=1(Nk)gk=1(Nk)1k2E[(igkeit)2]absent1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1superscript𝑘2𝐸delimited-[]superscriptsubscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}E\left[% \left(\sum_{i\in g_{k}}e_{it}\right)^{2}\right]= divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_E [ ( ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=1(Nk)gk=1(Nk)1k2[kσ2+k(k1)Cov(eit,ejt)]for ijformulae-sequenceabsent1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘1superscript𝑘2delimited-[]𝑘superscript𝜎2𝑘𝑘1𝐶𝑜𝑣subscript𝑒𝑖𝑡subscript𝑒𝑗𝑡for 𝑖𝑗=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}\left[k% \sigma^{2}+k(k-1)Cov(e_{it},e_{jt})\right]\quad\text{for }\>i\neq j= divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ italic_k italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_k ( italic_k - 1 ) italic_C italic_o italic_v ( italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_j italic_t end_POSTSUBSCRIPT ) ] for italic_i ≠ italic_j
=σ2k[1+(k1)ρ].absentsuperscript𝜎2𝑘delimited-[]1𝑘1𝜌=\frac{\sigma^{2}}{k}\left[1+(k-1)\rho\right].= divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k end_ARG [ 1 + ( italic_k - 1 ) italic_ρ ] .

Moreover,

DMSENTavg(k;ρ,σ)=σ2k(k+1)(1ρ)𝐷𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎superscript𝜎2𝑘𝑘11𝜌DMSE^{avg}_{NT}(k;\rho,\sigma)=\frac{\sigma^{2}}{k(k+1)}(1-\rho)italic_D italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) = divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ( italic_k + 1 ) end_ARG ( 1 - italic_ρ ) (18)

and

RNTavg(k;ρ)=1k[1+(k1)ρ].subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘𝜌1𝑘delimited-[]1𝑘1𝜌R^{avg}_{NT}(k;\rho)=\frac{1}{k}\left[1+(k-1)\rho\right].italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ ) = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG [ 1 + ( italic_k - 1 ) italic_ρ ] . (19)

(Note that σ𝜎\sigmaitalic_σ cancels in the RNTavg(k;ρ,σ)subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎R^{avg}_{NT}(k;\rho,\sigma)italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) calculation, so we simply write RNTavg(k;ρ)subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘𝜌R^{avg}_{NT}(k;\rho)italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ ).) If ρ=0𝜌0\rho=0italic_ρ = 0, the result (17) for MSENTavg(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎MSE^{avg}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) in the equicorrelation case of course collapses to the earlier result (6) for MSENTavg(k;σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜎MSE^{avg}_{NT}(k;\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_σ ) in the zero-correlation case.

Figure 7: Theoretical Equicorrelation MSENTavg(k;ρ,1)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌1MSE^{avg}_{NT}(k;\rho,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , 1 ) Crowd Size Signature Plots
Refer to caption
{adjustwidth}

1cm1cm Notes: We show equicorrelation MSENTavg(k;ρ,1)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌1MSE^{avg}_{NT}(k;\rho,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , 1 ) crowd size signature plots for group sizes k=1,,20𝑘120k=1,...,20italic_k = 1 , … , 20, and equicorrelations ρ=0.1,0.5,0.9𝜌0.10.50.9\rho=0.1,0.5,0.9italic_ρ = 0.1 , 0.5 , 0.9.

Figure 8: Theoretical Equicorrelation MSENTmin(k;.5,1)𝑀𝑆subscriptsuperscript𝐸𝑚𝑖𝑛𝑁𝑇𝑘.51MSE^{min}_{NT}(k;.5,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_m italic_i italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; .5 , 1 ), MSENTavg(k;.5,1)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘.51MSE^{avg}_{NT}(k;.5,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; .5 , 1 ), and MSENTmax(k;.5,1)𝑀𝑆subscriptsuperscript𝐸𝑚𝑎𝑥𝑁𝑇𝑘.51MSE^{max}_{NT}(k;.5,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_m italic_a italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; .5 , 1 ) Crowd Size Signature Plots
Refer to caption
{adjustwidth}

1cm1cm Notes: The data generating process is the equicorrelation model given by equations (9)-(10), with ρ=.5𝜌.5\rho=.5italic_ρ = .5, σ=1𝜎1\sigma=1italic_σ = 1, N=40𝑁40N=40italic_N = 40, and T=160𝑇160T=160italic_T = 160. For each group size k𝑘kitalic_k, 30,000 groups of size k𝑘kitalic_k were drawn at random for each t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T to produce the figure.

In Figure 7 we show MSENTavg(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎MSE^{avg}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) as a function of k𝑘kitalic_k, for various equicorrelations, ρ𝜌\rhoitalic_ρ, with σ=1𝜎1\sigma=1italic_σ = 1. From equation (17), the height of each curve at k=1𝑘1k=1italic_k = 1 is simply σ=1𝜎1\sigma=1italic_σ = 1, and the curves decrease for any fixed ρ𝜌\rhoitalic_ρ to a limiting value (ρ𝜌\rhoitalic_ρ) as the combining pool grows (k𝑘k\rightarrow\inftyitalic_k → ∞).121212Indeed under equicorrelation with σ=1𝜎1\sigma=1italic_σ = 1, as here, MSENTavg(k;ρ,σ)=RNTavg(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎subscriptsuperscript𝑅𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎MSE^{avg}_{NT}(k;\rho,\sigma)=R^{avg}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) = italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ). Indeed the gains from increasing k𝑘kitalic_k are initially large but decrease quickly. The MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E improvement, for example, in moving from k=1𝑘1k=1italic_k = 1 to k=5𝑘5k=5italic_k = 5 consistently dwarfs that of moving from k=5𝑘5k=5italic_k = 5 to k=20𝑘20k=20italic_k = 20.

Overall, then, the value of increasing the pool size (i.e., increasing k𝑘kitalic_k) is highest when k𝑘kitalic_k is small (small pool), when ρ𝜌\rhoitalic_ρ is low (weakly-correlated forecast errors), or when σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is high (volatile forecast errors). In particular, for realistic values of ρ𝜌\rhoitalic_ρ, around 0.5, say, most gains from increasing k𝑘kitalic_k are obtained by k=5𝑘5k=5italic_k = 5.

Several additional remarks are in order:

  1. (a)

    The fact that, for realistic values of ρ𝜌\rhoitalic_ρ, most MSENTavg(k;ρ,1)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌1MSE^{avg}_{NT}(k;\rho,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , 1 ) gains from increasing k𝑘kitalic_k are obtained by k=5𝑘5k=5italic_k = 5 does not necessarily indicate that typical surveys use too many forecasters. MSENTavg(k;ρ,1)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌1MSE^{avg}_{NT}(k;\rho,1)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , 1 ) is an average across all k𝑘kitalic_k-forecast combinations, and the best and worst k𝑘kitalic_k-average combinations, for example, will have very different MSEs. Figure 8 speaks to this; it shows MSENTmin(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑚𝑖𝑛𝑁𝑇𝑘𝜌𝜎MSE^{min}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_m italic_i italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ), MSENTavg(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎MSE^{avg}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ), and MSENTmax(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑚𝑎𝑥𝑁𝑇𝑘𝜌𝜎MSE^{max}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_m italic_a italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ )) under equicorrelation with ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 and σ=1𝜎1\sigma=1italic_σ = 1, for k=1,,20𝑘120k=1,...,20italic_k = 1 , … , 20.131313We use N=40𝑁40N=40italic_N = 40 as an approximation to the average number of forecasters participating in the SPF in any given quarter, and we use T=160𝑇160T=160italic_T = 160 to mimic the total sample size when working with 40 years of quarterly data, as in the SPF.

  2. (b)

    The equicorrelation case is the only one for which analytic results are readily obtainable. For example, even if we maintain the assumption of equal correlations but simply allow different forecast error variances (“weak equicorrelation”), the MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E of the k𝑘kitalic_k-person average forecast becomes a function of (k,ρ,σ12,,σN2)𝑘𝜌superscriptsubscript𝜎12superscriptsubscript𝜎𝑁2(k,\rho,\sigma_{1}^{2},...,\sigma_{N}^{2})( italic_k , italic_ρ , italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and little more can be said.141414See Appendix C for derivation of optimal combining weights in the weak equicorrelation case.

  3. (c)

    As mentioned earlier, the equicorrelation case naturally matches the provision of survey averages, because in that case simple averages are optimal. Hence, as we now proceed to a model-based empirical analysis of real forecasters, we work with the equicorrelation model, asking what values of ρ𝜌\rhoitalic_ρ and σ𝜎\sigmaitalic_σ make the equicorrelation model-based MSENTavg(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎MSE^{avg}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) signature plot as close as possible to the direct MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plot.

4 Estimating the Equicorrelation Model

Here we estimate the equicorrelation model by choosing its parameters ρ𝜌\rhoitalic_ρ and σ𝜎\sigmaitalic_σ to make the equicorrelation MSENTavg(k;ρ,σ)𝑀𝑆subscriptsuperscript𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝜌𝜎MSE^{avg}_{NT}(k;\rho,\sigma)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; italic_ρ , italic_σ ) as close as possible to the SPF MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ). This estimation strategy is closely related to, but different from, GMM estimation. Rather than matching model and data moments, it matches more interesting and interpretable functions of those moments, namely model-based and direct crowd size signature plots – as per the “indirect inference” of Smith Jr (1993) and Gourieroux et al. (1993). Henceforth we refer to it simply as the “matching estimator”.

Specifically, we solve for (ρ^,σ^)^𝜌^𝜎(\hat{\rho},\hat{\sigma})( over^ start_ARG italic_ρ end_ARG , over^ start_ARG italic_σ end_ARG ) such that

(ρ^,σ^)=argmin(ρ,σ)Q(ρ,σ),^𝜌^𝜎subscript𝜌𝜎𝑄𝜌𝜎(\hat{\rho},\hat{\sigma})=\arg\min_{(\rho,\sigma)}Q(\rho,\sigma),( over^ start_ARG italic_ρ end_ARG , over^ start_ARG italic_σ end_ARG ) = roman_arg roman_min start_POSTSUBSCRIPT ( italic_ρ , italic_σ ) end_POSTSUBSCRIPT italic_Q ( italic_ρ , italic_σ ) , (20)

where

Q(ρ,σ)=1Nk=1N(MSE^NTavg(k)MSENTavg(k;ρ,σ))2,𝑄𝜌𝜎1𝑁superscriptsubscript𝑘1𝑁superscriptsubscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝑀𝑆superscriptsubscript𝐸𝑁𝑇𝑎𝑣𝑔𝑘𝜌𝜎2Q(\rho,\sigma)=\frac{1}{N}\sum_{k=1}^{N}\left(\widehat{MSE}^{avg}_{NT}(k)-MSE_% {NT}^{avg}(k;\rho,\sigma)\right)^{2},italic_Q ( italic_ρ , italic_σ ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) - italic_M italic_S italic_E start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; italic_ρ , italic_σ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and the minimization is constrained such that σ>0𝜎0\sigma>0italic_σ > 0 and ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [.

4.1 Calculating the Solution

Calculating the minimum in equation (20) is very simple, because the bivariate minimization can be reduced to a univariate mimimization in ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [. To see this, recall that MSENTavg(k;ρ,σ)=σ2k[1+(k1)ρ]𝑀𝑆superscriptsubscript𝐸𝑁𝑇𝑎𝑣𝑔𝑘𝜌𝜎superscript𝜎2𝑘delimited-[]1𝑘1𝜌MSE_{NT}^{avg}(k;\rho,\sigma)=\frac{\sigma^{2}}{k}[1+(k-1)\rho]italic_M italic_S italic_E start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; italic_ρ , italic_σ ) = divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k end_ARG [ 1 + ( italic_k - 1 ) italic_ρ ], so that the first-order condition for σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is

1Nk=1N(MSE^NTavg(k)MSENTavg(k;ρ,σ))(1k+k1kρ)=0,1𝑁superscriptsubscript𝑘1𝑁subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝑀𝑆superscriptsubscript𝐸𝑁𝑇𝑎𝑣𝑔𝑘𝜌𝜎1𝑘𝑘1𝑘𝜌0\frac{1}{N}\sum_{k=1}^{N}\left(\widehat{MSE}^{avg}_{NT}(k)-MSE_{NT}^{avg}(k;% \rho,\sigma)\right)\left(\frac{1}{k}+\frac{k-1}{k}\rho\right)=0,divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) - italic_M italic_S italic_E start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; italic_ρ , italic_σ ) ) ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG + divide start_ARG italic_k - 1 end_ARG start_ARG italic_k end_ARG italic_ρ ) = 0 , (21)

and the first-order condition for ρ𝜌\rhoitalic_ρ is

1Nk=1N(MSE^NTavg(k)MSENTavg(k;ρ,σ))(11k)σ2=0.1𝑁superscriptsubscript𝑘1𝑁subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝑀𝑆superscriptsubscript𝐸𝑁𝑇𝑎𝑣𝑔𝑘𝜌𝜎11𝑘superscript𝜎20\frac{1}{N}\sum_{k=1}^{N}\left(\widehat{MSE}^{avg}_{NT}(k)-MSE_{NT}^{avg}(k;% \rho,\sigma)\right)\left(1-\frac{1}{k}\right)\sigma^{2}=0.divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) - italic_M italic_S italic_E start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; italic_ρ , italic_σ ) ) ( 1 - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0 . (22)

Combining equations (21) and (22) yields

σ2=c1c2+c3ρ,superscript𝜎2subscript𝑐1subscript𝑐2subscript𝑐3𝜌\sigma^{2}=\frac{c_{1}}{c_{2}+c_{3}\rho},italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_ρ end_ARG , (23)

where c1=k=1NMSE^NTavg(k)ksubscript𝑐1superscriptsubscript𝑘1𝑁subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘𝑘c_{1}=\sum_{k=1}^{N}\frac{\widehat{MSE}^{avg}_{NT}(k)}{k}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) end_ARG start_ARG italic_k end_ARG, c2=k=1N1k2subscript𝑐2superscriptsubscript𝑘1𝑁1superscript𝑘2c_{2}=\sum_{k=1}^{N}\frac{1}{k^{2}}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG and c3=k=1Nk1k2subscript𝑐3superscriptsubscript𝑘1𝑁𝑘1superscript𝑘2c_{3}=\sum_{k=1}^{N}\frac{k-1}{k^{2}}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG italic_k - 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Hence, at the optimum and conditional on the data, there is a deterministic inverse relationship between ρ𝜌\rhoitalic_ρ and σ𝜎\sigmaitalic_σ (see Figure 9), enabling one to restrict the parameter search to the small open interval ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [, as well as to explore the objective function visually as a function of ρ𝜌\rhoitalic_ρ alone (see Figure 10).

Figure 9: Illustration of the Relationship Between ρ𝜌\rhoitalic_ρ and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Refer to caption
{adjustwidth}

1cm1cm Notes: We show the relationship between ρ𝜌\rhoitalic_ρ and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT given by equation (23), for 1-step-ahead growth forecast errors. The highlighted values of ρ^=0.8^𝜌0.8\hat{\rho}=0.8over^ start_ARG italic_ρ end_ARG = 0.8 and σ^2=18.6superscript^𝜎218.6\hat{\sigma}^{2}=18.6over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 18.6 are the estimated values.

Figure 10: Illustration of the Objective Function and its Minimum
  Q(ρ)𝑄𝜌Q(\rho)italic_Q ( italic_ρ )    Q(ρ)𝑄𝜌Q(\rho)italic_Q ( italic_ρ ) in a Neighborhood of ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG
Refer to caption Refer to caption
{adjustwidth}

1cm1cm Notes: We show the objective function Q(ρ)𝑄𝜌Q(\rho)italic_Q ( italic_ρ ) of the matching estimator expressed as a function of ρ𝜌\rhoitalic_ρ, for 1-step-ahead growth forecast errors. The highlighted value of ρ^=0.8^𝜌0.8\hat{\rho}=0.8over^ start_ARG italic_ρ end_ARG = 0.8 is the estimated value.

Table 1: Equicorrelation Model Estimates
Growth
h=11h=1italic_h = 1 h=22h=2italic_h = 2 h=33h=3italic_h = 3 h=44h=4italic_h = 4
σ^2superscript^𝜎2\hat{\sigma}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 18.562 21.170 22.713 23.275
(0.606) (0.546) (0.589) (0.646)
ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG 0.801 0.843 0.842 0.831
(0.036) (0.028) (0.028) (0.030)
R^NTavg(1;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇1^𝜌\widehat{R}^{avg}_{NT}(1;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 1 ; over^ start_ARG italic_ρ end_ARG ) 1.000 1.000 1.000 1.000
R^NTavg(5;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇5^𝜌\widehat{R}^{avg}_{NT}(5;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 5 ; over^ start_ARG italic_ρ end_ARG ) 0.841 0.874 0.874 0.865
R^NTavg(15;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇15^𝜌\widehat{R}^{avg}_{NT}(15;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 15 ; over^ start_ARG italic_ρ end_ARG ) 0.815 0.853 0.853 0.842
Q(σ^2,ρ^)𝑄superscript^𝜎2^𝜌Q(\hat{\sigma}^{2},\hat{\rho})italic_Q ( over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_ρ end_ARG ) 5.99E-05 3.95E-06 7.95E-06 1.01E-05
Inflation
h=11h=1italic_h = 1 h=22h=2italic_h = 2 h=33h=3italic_h = 3 h=44h=4italic_h = 4
σ^2superscript^𝜎2\hat{\sigma}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 3.662 4.343 5.123 6.094
(0.253) (0.255) (0.295) (0.370)
ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG 0.580 0.644 0.650 0.630
(0.082) (0.068) (0.066) (0.071)
R^NTavg(1;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇1^𝜌\widehat{R}^{avg}_{NT}(1;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 1 ; over^ start_ARG italic_ρ end_ARG ) 1.000 1.000 1.000 1.000
R^NTavg(5;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇5^𝜌\widehat{R}^{avg}_{NT}(5;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 5 ; over^ start_ARG italic_ρ end_ARG ) 0.664 0.715 0.720 0.704
R^NTavg(15;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇15^𝜌\widehat{R}^{avg}_{NT}(15;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( 15 ; over^ start_ARG italic_ρ end_ARG ) 0.608 0.668 0.673 0.655
Q(σ^2,ρ^)𝑄superscript^𝜎2^𝜌Q(\hat{\sigma}^{2},\hat{\rho})italic_Q ( over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_ρ end_ARG ) 2.21E-06 9.56E-07 1.82E-06 4.36E-05
{adjustwidth}

1cm1cm Notes: We show equicorrelation model parameter estimates for SPF growth and inflation forecast errors at various horizons, with standard errors computed via 1000 bootstrap samples. We also show estimated relative MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E with respect to no averaging, R^NTavg(k;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘^𝜌\widehat{R}^{avg}_{NT}(k;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG ) for k=1,5,15𝑘1515k=1,5,15italic_k = 1 , 5 , 15. In the final line of each panel we show the value of the objective function evaluated at the estimated parameters, Q(σ^2,ρ^)𝑄superscript^𝜎2^𝜌Q(\hat{\sigma}^{2},\hat{\rho})italic_Q ( over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_ρ end_ARG ).

Figure 11: Estimated Equicorrelation MSE^NTavg(k;ρ^,σ^)superscriptsubscript^𝑀𝑆𝐸𝑁𝑇𝑎𝑣𝑔𝑘^𝜌^𝜎\widehat{MSE}_{NT}^{avg}(k;\hat{\rho},\hat{\sigma})over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG , over^ start_ARG italic_σ end_ARG ) Crowd Size Signature Plots

Growth               Inflation

Refer to caption
Refer to caption
{adjustwidth}

1cm1cm Notes: We show estimated equicorrelation MSE^NTavg(k;ρ^,σ^)superscriptsubscript^𝑀𝑆𝐸𝑁𝑇𝑎𝑣𝑔𝑘^𝜌^𝜎\widehat{MSE}_{NT}^{avg}(k;\hat{\rho},\hat{\sigma})over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG , over^ start_ARG italic_σ end_ARG ) crowd size signature plots for SPF growth and inflation forecasts at horizons h=1,2,3,41234h=1,2,3,4italic_h = 1 , 2 , 3 , 4, for group sizes k=1,2,,20𝑘1220k=1,2,...,20italic_k = 1 , 2 , … , 20.

Figure 12: Direct R^NTavg(k)superscriptsubscript^𝑅𝑁𝑇𝑎𝑣𝑔𝑘\widehat{R}_{NT}^{avg}(k)over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) and Estimated Equicorrelation R^NTavg(k;ρ^)superscriptsubscript^𝑅𝑁𝑇𝑎𝑣𝑔𝑘^𝜌\widehat{R}_{NT}^{avg}(k;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG ) Crowd Size Signature Plots

Direct, Growth            Est. Equicorrelation, Growth Refer to caption Refer to caption Direct, Inflation            Est. Equicorrelation, Inflation Refer to caption Refer to caption

{adjustwidth}

1cm1cm Notes: We show direct and estimated equicorrelation ratio crowd size signature plots (R^NTavg(k)superscriptsubscript^𝑅𝑁𝑇𝑎𝑣𝑔𝑘\widehat{R}_{NT}^{avg}(k)over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) and R^NTavg(k;ρ^)superscriptsubscript^𝑅𝑁𝑇𝑎𝑣𝑔𝑘^𝜌\widehat{R}_{NT}^{avg}(k;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG ), respectively) for growth and inflation forecasts at horizons h=1,2,3,41234h=1,2,3,4italic_h = 1 , 2 , 3 , 4, for group sizes k=1,2,,20𝑘1220k=1,2,...,20italic_k = 1 , 2 , … , 20.

In Table 1 we show the complete set of estimates (for σ2superscript𝜎2{\sigma}^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and ρ𝜌{\rho}italic_ρ, for growth and inflation, for h=1,414h=1,...4italic_h = 1 , … 4). σ^2superscript^𝜎2\hat{\sigma}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT increases with forecast horizon, reflecting the fact that the distant future is harder to forecast than the near future, and implying that the fitted equicorrelation MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E signature plots, MSE^NTavg(k;ρ^,σ^)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘^𝜌^𝜎\widehat{MSE}^{avg}_{NT}(k;\hat{\rho},\hat{\sigma})over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG , over^ start_ARG italic_σ end_ARG ), should shift upward with horizon, as confirmed in Figure 11. Comparison of the direct MSE^NTavg(k)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘\widehat{MSE}^{avg}_{NT}(k)over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) signature plots in Figure 3 with the equicorrelation model-based MSE^NTavg(k;ρ^,σ^)subscriptsuperscript^𝑀𝑆𝐸𝑎𝑣𝑔𝑁𝑇𝑘^𝜌^𝜎\widehat{MSE}^{avg}_{NT}(k;\hat{\rho},\hat{\sigma})over^ start_ARG italic_M italic_S italic_E end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG , over^ start_ARG italic_σ end_ARG ) signature plots in Figure 11 reveals a remarkably good equicorrelation model fit. We emphasize this in Figure 12, in which we show side-by-side direct R^NTavg(k)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘\widehat{R}^{avg}_{NT}(k)over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ) (left column) and equicorrelation model-based R^NTavg(k;ρ^)subscriptsuperscript^𝑅𝑎𝑣𝑔𝑁𝑇𝑘^𝜌\widehat{R}^{avg}_{NT}(k;\hat{\rho})over^ start_ARG italic_R end_ARG start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ( italic_k ; over^ start_ARG italic_ρ end_ARG ) (right column) signature plots.

4.2 Understanding the Near-Perfect Equicorrelation Fit

Here we present a closed-form solution for the direct crowd size signature plot. The result is significant in its own right and reveals why our numerical matching estimates for the equicorrelation model produce fitted signature plots that align so closely with direct signature plots. To maintain precision it will prove useful to state it as a formal theorem.

Theorem: Let etsubscript𝑒𝑡e_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be any covariance stationary N×1𝑁1N\times 1italic_N × 1 vector with mean zero and covariance matrix ΣΣ\Sigmaroman_Σ, given by

𝚺=(σ12c12c13c1Nc21σ22c23c2Nc31c32σ32c3NcN1cN2cN3σN2),𝚺matrixsuperscriptsubscript𝜎12subscript𝑐12subscript𝑐13subscript𝑐1𝑁subscript𝑐21superscriptsubscript𝜎22subscript𝑐23subscript𝑐2𝑁subscript𝑐31subscript𝑐32superscriptsubscript𝜎32subscript𝑐3𝑁subscript𝑐𝑁1subscript𝑐𝑁2subscript𝑐𝑁3superscriptsubscript𝜎𝑁2\mathbf{\Sigma}=\begin{pmatrix}\sigma_{1}^{2}&c_{12}&c_{13}&\cdots&c_{1N}\\ c_{21}&\sigma_{2}^{2}&c_{23}&\cdots&c_{2N}\\ c_{31}&c_{32}&\sigma_{3}^{2}&\cdots&c_{3N}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ c_{N1}&c_{N2}&c_{N3}&\cdots&\sigma_{N}^{2}\end{pmatrix},bold_Σ = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_c start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_c start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_c start_POSTSUBSCRIPT 1 italic_N end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_c start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_c start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_c start_POSTSUBSCRIPT 2 italic_N end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_c start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT end_CELL start_CELL italic_c start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_c start_POSTSUBSCRIPT 3 italic_N end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_c start_POSTSUBSCRIPT italic_N 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_N 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_N 3 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ,

and define the k𝑘kitalic_k-average MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E,

MSEavg(k)=1(Nk)gk=1(Nk)E[(1kigkeit)2],𝑀𝑆superscript𝐸𝑎𝑣𝑔𝑘1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘𝐸delimited-[]superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2MSE^{avg}(k)=\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}E\left[\left(% \frac{1}{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right],italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) = divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT italic_E [ ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ,

where gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represents any subset of etsubscript𝑒𝑡e_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of size k𝑘kitalic_k (k[1,N]𝑘1𝑁k\in[1,N]italic_k ∈ [ 1 , italic_N ]). Then

MSEavg(k)=σ2¯k(1+(k1)ρ¯)𝑀𝑆superscript𝐸𝑎𝑣𝑔𝑘¯superscript𝜎2𝑘1𝑘1¯𝜌MSE^{avg}(k)=\frac{\overline{\sigma^{2}}}{k}\left(1+(k-1)\overline{\rho}\right)italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) = divide start_ARG over¯ start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG italic_k end_ARG ( 1 + ( italic_k - 1 ) over¯ start_ARG italic_ρ end_ARG ) (24)

and

Ravg(k)=MSEavg(k)MSEavg(1)=1k(1+(k1)ρ¯),superscript𝑅𝑎𝑣𝑔𝑘𝑀𝑆superscript𝐸𝑎𝑣𝑔𝑘𝑀𝑆superscript𝐸𝑎𝑣𝑔11𝑘1𝑘1¯𝜌R^{avg}(k)=\frac{MSE^{avg}(k)}{MSE^{avg}(1)}=\frac{1}{k}\left(1+(k-1)\overline% {\rho}\right),italic_R start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) = divide start_ARG italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) end_ARG start_ARG italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( 1 ) end_ARG = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ( 1 + ( italic_k - 1 ) over¯ start_ARG italic_ρ end_ARG ) , (25)

where

σ2¯=1Ni=1Nσi2¯superscript𝜎21𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖2\overline{\sigma^{2}}=\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{2}over¯ start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
c¯=1(N2)1i<jNcij¯𝑐1binomial𝑁2subscript1𝑖𝑗𝑁subscript𝑐𝑖𝑗\overline{c}=\frac{1}{\binom{N}{2}}\sum_{1\leq i<j\leq N}c_{ij}over¯ start_ARG italic_c end_ARG = divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT 1 ≤ italic_i < italic_j ≤ italic_N end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT
ρ¯=c¯σ2¯.¯𝜌¯𝑐¯superscript𝜎2\overline{\rho}=\frac{\overline{c}}{\overline{\sigma^{2}}}.over¯ start_ARG italic_ρ end_ARG = divide start_ARG over¯ start_ARG italic_c end_ARG end_ARG start_ARG over¯ start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG .

Proof: We have:

MSEavg(k)=1(Nk)gk=1(Nk)E[(1kigkeit)2]𝑀𝑆superscript𝐸𝑎𝑣𝑔𝑘1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘𝐸delimited-[]superscript1𝑘subscript𝑖subscript𝑔𝑘subscript𝑒𝑖𝑡2MSE^{avg}(k)=\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}E\left[\left(% \frac{1}{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right]italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) = divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT italic_E [ ( divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=1k21(Nk)gk=1(Nk)[igkσi2+i,jgk,ijcij]absent1superscript𝑘21binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘delimited-[]subscript𝑖subscript𝑔𝑘superscriptsubscript𝜎𝑖2subscriptformulae-sequence𝑖𝑗subscript𝑔𝑘𝑖𝑗subscript𝑐𝑖𝑗=\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}\left[\sum_% {i\in g_{k}}\sigma_{i}^{2}+\sum_{i,j\in g_{k},i\neq j}c_{ij}\right]= divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_i , italic_j ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i ≠ italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ]
=1k21(Nk)[gk=1(Nk)igkσi2grand sum of variances+gk=1(Nk)i,jgk,ijcijgrand sum of covariances].absent1superscript𝑘21binomial𝑁𝑘delimited-[]subscriptsuperscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘subscript𝑖subscript𝑔𝑘superscriptsubscript𝜎𝑖2grand sum of variancessubscriptsuperscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘subscriptformulae-sequence𝑖𝑗subscript𝑔𝑘𝑖𝑗subscript𝑐𝑖𝑗grand sum of covariances=\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\left[\underbrace{\sum_{g_{k}=1}^{\binom% {N}{k}}\sum_{i\in g_{k}}\sigma_{i}^{2}}_{\text{grand sum of variances}}+% \underbrace{\sum_{g_{k}=1}^{\binom{N}{k}}\sum_{i,j\in g_{k},i\neq j}c_{ij}}_{% \text{grand sum of covariances}}\right].= divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG [ under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT grand sum of variances end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i , italic_j ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i ≠ italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT grand sum of covariances end_POSTSUBSCRIPT ] .

First consider the term related to variance. Note that, of the (Nk)binomial𝑁𝑘\binom{N}{k}( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) groups, there are (N1k1)binomial𝑁1𝑘1\binom{N-1}{k-1}( FRACOP start_ARG italic_N - 1 end_ARG start_ARG italic_k - 1 end_ARG ) groups that include ei2superscriptsubscript𝑒𝑖2e_{i}^{2}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The sum of igkσi2subscript𝑖subscript𝑔𝑘superscriptsubscript𝜎𝑖2\sum_{i\in g_{k}}\sigma_{i}^{2}∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over all groups gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is therefore

gk=1(Nk)igkσi2=(N1k1)i=1Nσi2.superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘subscript𝑖subscript𝑔𝑘superscriptsubscript𝜎𝑖2binomial𝑁1𝑘1superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖2\sum_{g_{k}=1}^{\binom{N}{k}}\sum_{i\in g_{k}}\sigma_{i}^{2}=\binom{N-1}{k-1}% \sum_{i=1}^{N}\sigma_{i}^{2}.∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( FRACOP start_ARG italic_N - 1 end_ARG start_ARG italic_k - 1 end_ARG ) ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (26)

Now consider the term related to covariance. When summing i,jgk,ijcijsubscriptformulae-sequence𝑖𝑗subscript𝑔𝑘𝑖𝑗subscript𝑐𝑖𝑗\sum_{i,j\in g_{k},i\neq j}c_{ij}∑ start_POSTSUBSCRIPT italic_i , italic_j ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i ≠ italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT across all groups gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, covariances between all possible pairs of eitsubscript𝑒𝑖𝑡e_{it}italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT and ejtsubscript𝑒𝑗𝑡e_{jt}italic_e start_POSTSUBSCRIPT italic_j italic_t end_POSTSUBSCRIPT are accounted for. Because we are summing over all arbitrary groups gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, each pair (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) appears the same number of times in the grand summation. To compute this number we observe that each group gksubscript𝑔𝑘g_{k}italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT contains k(k1)𝑘𝑘1k(k-1)italic_k ( italic_k - 1 ) pairwise covariances and that there are (Nk)binomial𝑁𝑘\binom{N}{k}( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) possible groups. Hence the total number of individual covariance terms in the grand sum is k(k1)(Nk)𝑘𝑘1binomial𝑁𝑘k(k-1)\binom{N}{k}italic_k ( italic_k - 1 ) ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ). The number of times that each individual covariance term cijsubscript𝑐𝑖𝑗c_{ij}italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT appears in the grand sum is k(k1)(Nk)(N2)𝑘𝑘1binomial𝑁𝑘binomial𝑁2\frac{k(k-1)\binom{N}{k}}{\binom{N}{2}}divide start_ARG italic_k ( italic_k - 1 ) ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) end_ARG, where (N2)binomial𝑁2\binom{N}{2}( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) is the total number of distinct pairs (i,j)𝑖𝑗(i,j)( italic_i , italic_j ). The grand sum of covariances is therefore

1(Nk)gk=1(Nk)i,jgk,ijcij=k(k1)(Nk)(N2)ii<jNcij.1binomial𝑁𝑘superscriptsubscriptsubscript𝑔𝑘1binomial𝑁𝑘subscriptformulae-sequence𝑖𝑗subscript𝑔𝑘𝑖𝑗subscript𝑐𝑖𝑗𝑘𝑘1binomial𝑁𝑘binomial𝑁2subscript𝑖𝑖𝑗𝑁subscript𝑐𝑖𝑗\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}\sum_{i,j\in g_{k},i\neq j}% c_{ij}=\frac{k(k-1)\binom{N}{k}}{\binom{N}{2}}\sum_{i\leq i<j\leq N}c_{ij}.divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i , italic_j ∈ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i ≠ italic_j end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG italic_k ( italic_k - 1 ) ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_i ≤ italic_i < italic_j ≤ italic_N end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT . (27)

Combining results (26) and (27), we have:

MSEavg(k)=1k21(Nk)(N1k1)i=1Nσi2+1k21(Nk)k(k1)(Nk)(N2)ii<jNcij𝑀𝑆superscript𝐸𝑎𝑣𝑔𝑘1superscript𝑘21binomial𝑁𝑘binomial𝑁1𝑘1superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖21superscript𝑘21binomial𝑁𝑘𝑘𝑘1binomial𝑁𝑘binomial𝑁2subscript𝑖𝑖𝑗𝑁subscript𝑐𝑖𝑗MSE^{avg}(k)=\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\binom{N-1}{k-1}\sum_{i=1}^{% N}\sigma_{i}^{2}+\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\frac{k(k-1)\binom{N}{k}% }{\binom{N}{2}}\sum_{i\leq i<j\leq N}c_{ij}italic_M italic_S italic_E start_POSTSUPERSCRIPT italic_a italic_v italic_g end_POSTSUPERSCRIPT ( italic_k ) = divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG ( FRACOP start_ARG italic_N - 1 end_ARG start_ARG italic_k - 1 end_ARG ) ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG divide start_ARG italic_k ( italic_k - 1 ) ( FRACOP start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_i ≤ italic_i < italic_j ≤ italic_N end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT
=1k[(1Ni=1Nσi2)+(k1)(1(N2)ii<jNcij)]absent1𝑘delimited-[]1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖2𝑘11binomial𝑁2subscript𝑖𝑖𝑗𝑁subscript𝑐𝑖𝑗=\frac{1}{k}\left[\left(\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{2}\right)+(k-1)% \left(\frac{1}{\binom{N}{2}}\sum_{i\leq i<j\leq N}c_{ij}\right)\right]= divide start_ARG 1 end_ARG start_ARG italic_k end_ARG [ ( divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + ( italic_k - 1 ) ( divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_i ≤ italic_i < italic_j ≤ italic_N end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ]
=σ2¯k[1+(k1)ρ¯].absent¯superscript𝜎2𝑘delimited-[]1𝑘1¯𝜌=\frac{\overline{\sigma^{2}}}{k}\left[1+(k-1)\overline{\rho}\right].= divide start_ARG over¯ start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG italic_k end_ARG [ 1 + ( italic_k - 1 ) over¯ start_ARG italic_ρ end_ARG ] .

This completes the proof. \square

Several remarks are in order:

  1. (a)

    Equation (24) reveals that the direct crowd size signature plot is simply the equicorrelation model-based signature plot evaluated at particular values of the equicorrelation model parameters. This is true despite the fact that (24) does not require the forecast errors to be truly equicorrelated. Hence the “best-matching” equicorrelation model-based signature plot will always match the direct plot perfectly, regardless of whether the forecast errors are truly equicorrelated.

  2. (b)

    Equation (24) suggests an alternative, closed-form, matching estimator for the equicorrelation model:

    σ^2=1Ni=1Nσ^i2superscript^𝜎21𝑁superscriptsubscript𝑖1𝑁superscriptsubscript^𝜎𝑖2\widehat{\sigma}^{2}=\frac{1}{N}\sum_{i=1}^{N}\widehat{\sigma}_{i}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (28)
    ρ^=1(N2)1i<jNc^ij1Ni=1Nσ^i2.^𝜌1binomial𝑁2subscript1𝑖𝑗𝑁subscript^𝑐𝑖𝑗1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript^𝜎𝑖2\widehat{\rho}=\frac{\frac{1}{\binom{N}{2}}\sum\limits_{1\leq i<j\leq N}% \widehat{c}_{ij}}{\frac{1}{N}\sum\limits_{i=1}^{N}\widehat{\sigma}_{i}^{2}}.over^ start_ARG italic_ρ end_ARG = divide start_ARG divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_N end_ARG start_ARG 2 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT 1 ≤ italic_i < italic_j ≤ italic_N end_POSTSUBSCRIPT over^ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (29)
  3. (c)

    Assessment of whether the forecast errors are truly equicorrelated could be done (under much stronger assumptions) by maximum-likelihood estimation of a dynamic-factor model, followed by likelihood-ratio tests of the restrictions implied by equicorrelation, as sketched in Appendix B for both weak and strong equicorrelation.

5 Summary, Conclusions, and Directions for Future Research

We have studied the properties of macroeconomic survey forecast response averages as the number of survey respondents grows, characterizing the speed and pattern of the “gains from diversification” and their eventual decrease with “portfolio size” (the number of survey respondents) in both (1) the key real-world data-based environment of the U.S. Survey of Professional Forecasters (SPF), and (2) the theoretical model-based environment of equicorrelated forecast errors. We proceeded by proposing and comparing various direct and model-based “crowd size signature plots”, which summarize the forecasting performance of k𝑘kitalic_k-average forecasts as a function of k𝑘kitalic_k, where k𝑘kitalic_k is the number of forecasts in the average. We then estimated the equicorrelation model for growth and inflation forecast errors by choosing model parameters to minimize the divergence between direct and model-based signature plots.

The results indicate near-perfect equicorrelation model fit for both growth and inflation, which we explicated by showing analytically that, under conditions, the direct and fitted equicorrelation model-based signature plots are identical at a particular model parameter configuration, which we characterize. We find that the gains from diversification are greater for inflation forecasts than for growth forecasts, but that both the inflation and growth diversification gains nevertheless decrease quite quickly, so that fewer SPF respondents than currently used may be adequate.

Several directions for future research appear promising, including, in no particular order:

  1. (a)

    Instead of considering MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_Es across (Nk)binomial𝑁𝑘N\choose k( binomial start_ARG italic_N end_ARG start_ARG italic_k end_ARG ) possible k𝑘kitalic_k-average forecasts and averaging to obtain a “representative k𝑘kitalic_k-average” forecast MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E as a function of k𝑘kitalic_k, one may want to consider “best k𝑘kitalic_k-average” forecast MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E as a function of k𝑘kitalic_k, where the unique best k𝑘kitalic_k-average forecast is obtained in each period as the k𝑘kitalic_k-average that performed best historically.

  2. (b)

    One may want to allow for time-varying equicorrelation parameters, as σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT might, for example, move downward with the Great Moderation, while ρ𝜌\rhoitalic_ρ might move counter-cyclically. The strong equicorrelation model in dynamic-factor form becomes

    eit=δtzt+witsubscript𝑒𝑖𝑡subscript𝛿𝑡subscript𝑧𝑡subscript𝑤𝑖𝑡e_{it}=\delta_{t}z_{t}+w_{it}italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT
    zt=ϕzt1+vt,subscript𝑧𝑡italic-ϕsubscript𝑧𝑡1subscript𝑣𝑡z_{t}=\phi z_{t-1}+v_{t},italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ϕ italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

    where witiid(0,σwt2)similar-tosubscript𝑤𝑖𝑡𝑖𝑖𝑑0superscriptsubscript𝜎𝑤𝑡2w_{it}\sim iid(0,\sigma_{wt}^{2})italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , italic_σ start_POSTSUBSCRIPT italic_w italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), vtiid(0,σv2)similar-tosubscript𝑣𝑡𝑖𝑖𝑑0superscriptsubscript𝜎𝑣2v_{t}\sim iid(0,\sigma_{v}^{2})italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , italic_σ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and witvt,i,tperpendicular-tosubscript𝑤𝑖𝑡subscript𝑣𝑡for-all𝑖𝑡w_{it}\perp v_{t},\forall i,titalic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ⟂ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ∀ italic_i , italic_t. Immediately,

    etiid(0,Σt(ρt)),similar-tosubscript𝑒𝑡𝑖𝑖𝑑0subscriptΣ𝑡subscript𝜌𝑡e_{t}\sim iid\left(0,~{}\Sigma_{t}(\rho_{t})\right),italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , roman_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ,

    where

    Σt(ρt)=σt2(1ρtρtρt1ρtρtρt1)subscriptΣ𝑡subscript𝜌𝑡superscriptsubscript𝜎𝑡2matrix1subscript𝜌𝑡subscript𝜌𝑡subscript𝜌𝑡1subscript𝜌𝑡subscript𝜌𝑡subscript𝜌𝑡1\Sigma_{t}(\rho_{t})~{}=~{}\sigma_{t}^{2}\begin{pmatrix}1&\rho_{t}&\cdots&\rho% _{t}\\ \rho_{t}&1&\cdots&\rho_{t}\\ \vdots&\vdots&\ddots&\vdots\\ \rho_{t}&\rho_{t}&\cdots&1\end{pmatrix}roman_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG )
    σt2=δt2var(zt)+σwt2superscriptsubscript𝜎𝑡2superscriptsubscript𝛿𝑡2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑤𝑡2\sigma_{t}^{2}=\delta_{t}^{2}var(z_{t})+\sigma_{wt}^{2}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT italic_w italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
    ρt=δt2var(zt)δt2var(zt)+σwt2.subscript𝜌𝑡superscriptsubscript𝛿𝑡2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝛿𝑡2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑤𝑡2\rho_{t}=\frac{\delta_{t}^{2}var(z_{t})}{\delta_{t}^{2}var(z_{t})+\sigma_{wt}^% {2}}.italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT italic_w italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .
  3. (c)

    One may want to complement our exploration of the U.S. SPF with a comparative exploration of the European SPF.151515For an introduction to the European SPF, see the materials at https://data.ecb.europa.eu/methodology/survey-professional-forecasters-spf. Doing so appears feasible but non-trivial, due to cross-survey differences in sample periods, economic indicator concepts (e.g., inflation), and timing conventions, and we reserve it for future work.

\appendixpage\addappheadtotoc

Appendix A Data Definitions and Sources

We obtain U.S. quarterly level forecasts of real output and the GDP deflator from the Federal Reserve Bank of Philadelphia’s Individual Forecasts: Survey of Professional Forecasters (variables RGDP𝑅𝐺𝐷𝑃RGDPitalic_R italic_G italic_D italic_P and PGDP𝑃𝐺𝐷𝑃PGDPitalic_P italic_G italic_D italic_P, respectively). We transform the level forecasts into annualized growth rate forecasts using:

gt+h|t1=100((ft+h|t1ft+h1|t1)41),subscript𝑔𝑡conditional𝑡1100superscriptsubscript𝑓𝑡conditional𝑡1subscript𝑓𝑡conditional1𝑡141g_{t+h|t-1}=100\left(\left(\frac{f_{t+h|t-1}}{f_{t+h-1|t-1}}\right)^{4}-1% \right),italic_g start_POSTSUBSCRIPT italic_t + italic_h | italic_t - 1 end_POSTSUBSCRIPT = 100 ( ( divide start_ARG italic_f start_POSTSUBSCRIPT italic_t + italic_h | italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_t + italic_h - 1 | italic_t - 1 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT - 1 ) , (A1)

where ft+h|t1subscript𝑓𝑡conditional𝑡1f_{t+h|t-1}italic_f start_POSTSUBSCRIPT italic_t + italic_h | italic_t - 1 end_POSTSUBSCRIPT is a quarterly level forecast (either RGDP𝑅𝐺𝐷𝑃RGDPitalic_R italic_G italic_D italic_P or PGDP𝑃𝐺𝐷𝑃PGDPitalic_P italic_G italic_D italic_P) for quarter t+h𝑡t+hitalic_t + italic_h made using information available in quarter t1𝑡1t-1italic_t - 1. For additional information, see https://www.philadelphiafed.org/surveys-and-data/real-time-data-research/individual-forecasts.

We obtain the corresponding realizations from the Federal Reserve Bank of Philadelphia’s Forecast Error Statistics for the Survey of Professional Forecasters (December 2023 vintage). The realizations are reported as annualized growth rates, as in equation (A1) above, so there is no need for additional transformation. For additional information, see https://www.philadelphiafed.org/surveys-and-data/real-time-data-research/error-statistics.

Appendix B Strong Equicorrelation, Weak Equicorrelation, and Factor Structure

Consider a standard model of dynamic single-factor structure,

eit=δizt+witsubscript𝑒𝑖𝑡subscript𝛿𝑖subscript𝑧𝑡subscript𝑤𝑖𝑡e_{it}=\delta_{i}z_{t}+w_{it}italic_e start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT (B1)
zt=ϕzt1+vt,subscript𝑧𝑡italic-ϕsubscript𝑧𝑡1subscript𝑣𝑡z_{t}=\phi z_{t-1}+v_{t},italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ϕ italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (B2)

where witiid(0,σwi2)similar-tosubscript𝑤𝑖𝑡𝑖𝑖𝑑0superscriptsubscript𝜎𝑤𝑖2w_{it}\sim iid(0,\sigma_{wi}^{2})italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), vtiid(0,σv2)similar-tosubscript𝑣𝑡𝑖𝑖𝑑0superscriptsubscript𝜎𝑣2v_{t}\sim iid(0,\sigma_{v}^{2})italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_i italic_i italic_d ( 0 , italic_σ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and witvtperpendicular-tosubscript𝑤𝑖𝑡subscript𝑣𝑡w_{it}\perp v_{t}italic_w start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ⟂ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, i,tfor-all𝑖𝑡\forall i,t∀ italic_i , italic_t. The implied forecast error covariance matrix ΣΣ\Sigmaroman_Σ fails to satisfy equicorrelation; that is,

Σσ2(1ρρρ1ρρρ1),Σsuperscript𝜎2matrix1𝜌𝜌𝜌1𝜌𝜌𝜌1\Sigma~{}\neq~{}\sigma^{2}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\end{pmatrix},roman_Σ ≠ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) ,

because the forecast error variances generally vary with i𝑖iitalic_i, and their correlations generally vary with i𝑖iitalic_i and j𝑗jitalic_j. In particular, simple calculations reveal that

σi2var(ei,t)=δi2var(zt)+σwi2=δi2(var(zt)+σwi2δi2),i,formulae-sequencesubscriptsuperscript𝜎2𝑖𝑣𝑎𝑟subscript𝑒𝑖𝑡superscriptsubscript𝛿𝑖2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑤𝑖2superscriptsubscript𝛿𝑖2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑤𝑖2superscriptsubscript𝛿𝑖2for-all𝑖\begin{split}\sigma^{2}_{i}\equiv var(e_{i,t})&=\delta_{i}^{2}var(z_{t})+% \sigma_{wi}^{2}\\ &=\delta_{i}^{2}\left(var(z_{t})+\frac{\sigma_{wi}^{2}}{\delta_{i}^{2}}\right)% ,~{}\forall i,\end{split}start_ROW start_CELL italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≡ italic_v italic_a italic_r ( italic_e start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT ) end_CELL start_CELL = italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , ∀ italic_i , end_CELL end_ROW

where var(zt)=σv21ϕ2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑣21superscriptitalic-ϕ2var(z_{t})=\frac{\sigma_{v}^{2}}{1-\phi^{2}}italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, and

ρijcorr(ei,t,ej,t)=δiδjvar(zt)δi2var(zt)+σwi2δj2var(zt)+σwj2=11+σwi2δi2var(zt)1+σwj2δj2var(zt),i,j.formulae-sequencesubscript𝜌𝑖𝑗𝑐𝑜𝑟𝑟subscript𝑒𝑖𝑡subscript𝑒𝑗𝑡subscript𝛿𝑖subscript𝛿𝑗𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝛿𝑖2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑤𝑖2superscriptsubscript𝛿𝑗2𝑣𝑎𝑟subscript𝑧𝑡superscriptsubscript𝜎𝑤𝑗211superscriptsubscript𝜎𝑤𝑖2superscriptsubscript𝛿𝑖2𝑣𝑎𝑟subscript𝑧𝑡1superscriptsubscript𝜎𝑤𝑗2superscriptsubscript𝛿𝑗2𝑣𝑎𝑟subscript𝑧𝑡for-all𝑖𝑗\begin{split}\rho_{ij}\equiv corr(e_{i,t},e_{j,t})&=\frac{\delta_{i}\delta_{j}% var(z_{t})}{\sqrt{\delta_{i}^{2}var(z_{t})+\sigma_{wi}^{2}}\sqrt{\delta_{j}^{2% }var(z_{t})+\sigma_{wj}^{2}}}\\ &=\frac{1}{\sqrt{1+\frac{\sigma_{wi}^{2}}{\delta_{i}^{2}var(z_{t})}}\sqrt{1+% \frac{\sigma_{wj}^{2}}{\delta_{j}^{2}var(z_{t})}}}\,,~{}\forall i,j.\end{split}start_ROW start_CELL italic_ρ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≡ italic_c italic_o italic_r italic_r ( italic_e start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT ) end_CELL start_CELL = divide start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG square-root start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG square-root start_ARG italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT italic_w italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG end_ARG square-root start_ARG 1 + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_w italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v italic_a italic_r ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG end_ARG end_ARG , ∀ italic_i , italic_j . end_CELL end_ROW (B3)

Nevertheless, certain simple restrictions on the dynamic factor model (DFM) (B1)-(B2) produce certain forms of equicorrelation. First, from equation (B3), it is apparent that ρij=ρ,ijformulae-sequencesubscript𝜌𝑖𝑗𝜌for-all𝑖𝑗\rho_{ij}=\rho,~{}\forall i\neq jitalic_ρ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_ρ , ∀ italic_i ≠ italic_j if and only if

σwi2δi2=σwj2δj2,ij,formulae-sequencesuperscriptsubscript𝜎𝑤𝑖2superscriptsubscript𝛿𝑖2superscriptsubscript𝜎𝑤𝑗2superscriptsubscript𝛿𝑗2for-all𝑖𝑗\frac{\sigma_{wi}^{2}}{\delta_{i}^{2}}=\frac{\sigma_{wj}^{2}}{\delta_{j}^{2}},% ~{}\forall i\neq j,divide start_ARG italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_w italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , ∀ italic_i ≠ italic_j , (B4)

so that imposition of the constraint (B4) on the measurement equation (B1) produces a “weak” form of equicorrelation with identical correlations (ρ𝜌\rhoitalic_ρ) but potentially different idiosyncratic shock variances (σ12,,σN2superscriptsubscript𝜎12superscriptsubscript𝜎𝑁2\sigma_{1}^{2},...,\sigma_{N}^{2}italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT). That is,

Σ=(σ12ρρρσ22ρρρσN2).Σmatrixsuperscriptsubscript𝜎12𝜌𝜌𝜌superscriptsubscript𝜎22𝜌𝜌𝜌superscriptsubscript𝜎𝑁2\Sigma=\begin{pmatrix}\sigma_{1}^{2}&\rho&\cdots&\rho\\ \rho&\sigma_{2}^{2}&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&\sigma_{N}^{2}\\ \end{pmatrix}.roman_Σ = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) .

Second, it is also apparent from equation (B3) that if we impose the stronger restriction,

σwi2=σwj2andδi=δj,i,j,formulae-sequencesuperscriptsubscript𝜎𝑤𝑖2superscriptsubscript𝜎𝑤𝑗2andsubscript𝛿𝑖subscript𝛿𝑗for-all𝑖𝑗\sigma_{wi}^{2}=\sigma_{wj}^{2}~{}~{}{\rm and}~{}~{}\delta_{i}=\delta_{j},~{}% \forall i,j,italic_σ start_POSTSUBSCRIPT italic_w italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_σ start_POSTSUBSCRIPT italic_w italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_and italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∀ italic_i , italic_j , (B5)

which of course implies the weaker restriction (B4), then we obtain (“strong”) equicorrelation as we have defined it throughout this paper, with identical correlations (ρ𝜌\rhoitalic_ρ) and idiosyncratic shock variances (σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT). That is,

Σ=σ2(1ρρρ1ρρρ1).Σsuperscript𝜎2matrix1𝜌𝜌𝜌1𝜌𝜌𝜌1\Sigma~{}=~{}\sigma^{2}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\end{pmatrix}.roman_Σ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) .

Although we do not pursue maximum-likelihood estimation in this paper, we note that one may estimate the unrestricted DFM (B1)-(B2) by exact Gaussian pseudo maximum likelihood (ML). It is already in state-space form, and one pass of the Kalman filter yields the innovations needed for likelihood construction and evaluation, and it also accounts for missing observations associated with survey entry and exit. One may also impose weak or strong equicorrelation restrictions (B4) or (B5), respectively, and assess them using likelihood-ratio tests.

Appendix C Optimal Combining Weights Under Weak Equicorrelation

Here we briefly consider the “weak equicorrelation” case, with correlation ρ𝜌\rhoitalic_ρ and variances σ12,σ22,,σN2superscriptsubscript𝜎12superscriptsubscript𝜎22superscriptsubscript𝜎𝑁2\sigma_{1}^{2},~{}\sigma_{2}^{2},~{}...,~{}\sigma_{N}^{2}italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT; that is,

Σ=(σ12ρρρσ22ρρρσN2),Σmatrixsuperscriptsubscript𝜎12𝜌𝜌𝜌superscriptsubscript𝜎22𝜌𝜌𝜌superscriptsubscript𝜎𝑁2\Sigma=\begin{pmatrix}\sigma_{1}^{2}&\rho&\cdots&\rho\\ \rho&\sigma_{2}^{2}&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&\sigma_{N}^{2}\\ \end{pmatrix},roman_Σ = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ,

where ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [. We can decompose the covariance matrix ΣΣ\Sigmaroman_Σ as

Σ=DRD=(σ1000σ2000σN)(1ρρρ1ρρρ1)(σ1000σ2000σN),Σ𝐷𝑅𝐷matrixsubscript𝜎1000subscript𝜎2000subscript𝜎𝑁matrix1𝜌𝜌𝜌1𝜌𝜌𝜌1matrixsubscript𝜎1000subscript𝜎2000subscript𝜎𝑁\Sigma=DRD=\begin{pmatrix}\sigma_{1}&0&\cdots&0\\ 0&\sigma_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}\\ \end{pmatrix}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\\ \end{pmatrix}\begin{pmatrix}\sigma_{1}&0&\cdots&0\\ 0&\sigma_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}\\ \end{pmatrix},roman_Σ = italic_D italic_R italic_D = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) ,

where R𝑅Ritalic_R is positive definite if and only if ρ]1N1,1[𝜌1𝑁11\rho\in\left]\frac{-1}{N-1},1\right[italic_ρ ∈ ] divide start_ARG - 1 end_ARG start_ARG italic_N - 1 end_ARG , 1 [. The inverse of the covariance matrix is

Σ1=D1R1D1=(σ11000σ21000σN1)(1ρρρ1ρρρ1)1(σ11000σ21000σN1),superscriptΣ1superscript𝐷1superscript𝑅1superscript𝐷1matrixsuperscriptsubscript𝜎11000superscriptsubscript𝜎21000superscriptsubscript𝜎𝑁1superscriptmatrix1𝜌𝜌𝜌1𝜌𝜌𝜌11matrixsuperscriptsubscript𝜎11000superscriptsubscript𝜎21000superscriptsubscript𝜎𝑁1\Sigma^{-1}=D^{-1}R^{-1}D^{-1}=\begin{pmatrix}\sigma_{1}^{-1}&0&\cdots&0\\ 0&\sigma_{2}^{-1}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}^{-1}\\ \end{pmatrix}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\\ \end{pmatrix}^{-1}\begin{pmatrix}\sigma_{1}^{-1}&0&\cdots&0\\ 0&\sigma_{2}^{-1}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}^{-1}\\ \end{pmatrix},roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL italic_ρ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_ρ end_CELL start_CELL italic_ρ end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ,

where

R1=11ρIρ(1ρ)(1+(N1)ρ)ιι,superscript𝑅111𝜌𝐼𝜌1𝜌1𝑁1𝜌𝜄superscript𝜄R^{-1}=\frac{1}{1-\rho}I-\frac{\rho}{(1-\rho)(1+(N-1)\rho)}\iota\iota^{\prime},italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 1 - italic_ρ end_ARG italic_I - divide start_ARG italic_ρ end_ARG start_ARG ( 1 - italic_ρ ) ( 1 + ( italic_N - 1 ) italic_ρ ) end_ARG italic_ι italic_ι start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ,

I𝐼Iitalic_I stands for an N×N𝑁𝑁N\times Nitalic_N × italic_N identity matrix, and ι𝜄\iotaitalic_ι is a N𝑁Nitalic_N-vector of ones.

Recall that, as noted in the text, the optimal combining weight is

λ=(ιΣ1ι)1Σ1ι,superscript𝜆superscriptsuperscript𝜄superscriptΣ1𝜄1superscriptΣ1𝜄\lambda^{*}=\left(\iota^{\prime}\Sigma^{-1}\iota\right)^{-1}\Sigma^{-1}\iota,italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_ι start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι , (C1)

The first part of the optimal combining weight (C1) is

ιΣ1ι=(1+(N1)ρ)(i=1Nσi2)ρ(i=1Nσi1)(i=1Nσi1)(1ρ)(1+(N1)ρ),superscript𝜄superscriptΣ1𝜄1𝑁1𝜌superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖2𝜌superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖1superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖11𝜌1𝑁1𝜌\iota^{\prime}\Sigma^{-1}\iota=\frac{(1+(N-1)\rho)\left(\sum_{i=1}^{N}\sigma_{% i}^{-2}\right)-\rho\left(\sum_{i=1}^{N}\sigma_{i}^{-1}\right)\left(\sum_{i=1}^% {N}\sigma_{i}^{-1}\right)}{(1-\rho)(1+(N-1)\rho)},italic_ι start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι = divide start_ARG ( 1 + ( italic_N - 1 ) italic_ρ ) ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) - italic_ρ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG ( 1 - italic_ρ ) ( 1 + ( italic_N - 1 ) italic_ρ ) end_ARG , (C2)

and the second part is

Σ1ι=(1+(N1)ρ)𝝈2ρ(i=1Nσi1)𝝈1(1ρ)(1+(N1)ρ),superscriptΣ1𝜄1𝑁1𝜌superscript𝝈2𝜌superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖1superscript𝝈11𝜌1𝑁1𝜌\Sigma^{-1}\iota=\frac{(1+(N-1)\rho)\boldsymbol{\sigma}^{-2}-\rho\left(\sum_{i% =1}^{N}\sigma_{i}^{-1}\right)\boldsymbol{\sigma}^{-1}}{(1-\rho)(1+(N-1)\rho)},roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ι = divide start_ARG ( 1 + ( italic_N - 1 ) italic_ρ ) bold_italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT - italic_ρ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) bold_italic_σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_ρ ) ( 1 + ( italic_N - 1 ) italic_ρ ) end_ARG , (C3)

where

𝝈2=(σ12σ22σN2)and𝝈1=(σ11σ21σN1).formulae-sequencesuperscript𝝈2matrixsuperscriptsubscript𝜎12superscriptsubscript𝜎22superscriptsubscript𝜎𝑁2andsuperscript𝝈1matrixsuperscriptsubscript𝜎11superscriptsubscript𝜎21superscriptsubscript𝜎𝑁1\boldsymbol{\sigma}^{-2}=\begin{pmatrix}\sigma_{1}^{-2}\\ \sigma_{2}^{-2}\\ \vdots\\ \ \sigma_{N}^{-2}\end{pmatrix}\quad\text{and}\quad\boldsymbol{\sigma}^{-1}=% \begin{pmatrix}\sigma_{1}^{-1}\\ \sigma_{2}^{-1}\\ \vdots\\ \ \sigma_{N}^{-1}\end{pmatrix}.bold_italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) and bold_italic_σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ( start_ARG start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) .

Inserting equations (C2) and (C3) into equation (C1), we get the optimal weight for the i𝑖iitalic_ith forecast as

λi=σi2+ρ(N2)σi2ρ(jiσi1σj1)i=1N(σi2+ρ(N2)σi2ρ(jiσi1σj1)).superscriptsubscript𝜆𝑖superscriptsubscript𝜎𝑖2𝜌𝑁2superscriptsubscript𝜎𝑖2𝜌subscript𝑗𝑖superscriptsubscript𝜎𝑖1superscriptsubscript𝜎𝑗1superscriptsubscript𝑖1𝑁superscriptsubscript𝜎𝑖2𝜌𝑁2superscriptsubscript𝜎𝑖2𝜌subscript𝑗𝑖superscriptsubscript𝜎𝑖1superscriptsubscript𝜎𝑗1\lambda_{i}^{*}=\frac{\sigma_{i}^{-2}+\rho(N-2)\sigma_{i}^{-2}-\rho\left(\sum_% {j\neq i}\sigma_{i}^{-1}\sigma_{j}^{-1}\right)}{\sum_{i=1}^{N}\left(\sigma_{i}% ^{-2}+\rho(N-2)\sigma_{i}^{-2}-\rho\left(\sum_{j\neq i}\sigma_{i}^{-1}\sigma_{% j}^{-1}\right)\right)}.italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT + italic_ρ ( italic_N - 2 ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT - italic_ρ ( ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT + italic_ρ ( italic_N - 2 ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT - italic_ρ ( ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ) end_ARG . (C4)

To check the formula, note that for N=2𝑁2N=2italic_N = 2 we obtain the standard Bates and Granger (1969) optimal bivariate combining weight,

λ1=σ12ρσ11σ21σ12+σ222ρσ11σ21=σ22ρσ1σ2σ12+σ222ρσ1σ2,superscriptsubscript𝜆1superscriptsubscript𝜎12𝜌superscriptsubscript𝜎11superscriptsubscript𝜎21superscriptsubscript𝜎12superscriptsubscript𝜎222𝜌superscriptsubscript𝜎11superscriptsubscript𝜎21superscriptsubscript𝜎22𝜌subscript𝜎1subscript𝜎2superscriptsubscript𝜎12superscriptsubscript𝜎222𝜌subscript𝜎1subscript𝜎2\lambda_{1}^{*}=\frac{\sigma_{1}^{-2}-\rho\sigma_{1}^{-1}\sigma_{2}^{-1}}{% \sigma_{1}^{-2}+\sigma_{2}^{-2}-2\rho\sigma_{1}^{-1}\sigma_{2}^{-1}}=\frac{% \sigma_{2}^{2}-\rho\sigma_{1}\sigma_{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}-2\rho% \sigma_{1}\sigma_{2}},italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT - italic_ρ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT - 2 italic_ρ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_ρ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_ρ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ,

and for any N𝑁Nitalic_N, but with σj2=σ2subscriptsuperscript𝜎2𝑗superscript𝜎2\sigma^{2}_{j}=\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT jfor-all𝑗\,\forall j∀ italic_j (equicorrelation), we obtain weights,

λi=(1ρ)σ2N(1ρ)σ2=1N,i.formulae-sequencesuperscriptsubscript𝜆𝑖1𝜌superscript𝜎2𝑁1𝜌superscript𝜎21𝑁for-all𝑖\lambda_{i}^{*}=\frac{(1-\rho)\sigma^{-2}}{N(1-\rho)\sigma^{-2}}=\frac{1}{N},~% {}~{}\forall i.italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG ( 1 - italic_ρ ) italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N ( 1 - italic_ρ ) italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG , ∀ italic_i .

References

  • (1)
  • Aliber et al. (2023) Aliber, R.Z., C.P. Kindleberger, and R.N. McCauley (2023), Manias, Panics, and Crashes: A History of Financial Crises, 8th Edition, Palgrave MacMillan.
  • Batchelor and Dua (1995) Batchelor, R. and P. Dua (1995), “Forecaster Diversity and the Benefits of Combining Forecasts,” Management Science,  41, 68–75.
  • Bates and Granger (1969) Bates, J.M. and C.W.J Granger (1969), “The Combination of Forecasts,” Operations Research Quarterly,  20, 451–468.
  • Clemen (1989) Clemen, R.T. (1989), “Combining Forecasts: A Review and Annotated Bibliography (With Discussion),” International Journal of Forecasting,  5, 559–583.
  • Croushore and Stark (2019) Croushore, D. and T. Stark (2019), “Fifty Years of the Survey of Professional Forecasters,” Economic Insights,  4, 1–11.
  • Diebold and Shin (2019) Diebold, F.X. and M. Shin (2019), “Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and its Derivatives,” International Journal of Forecasting,  35, 1679–1691.
  • Diebold et al. (2023) Diebold, F.X., M. Shin, and B. Zhang (2023), “On the Aggregation of Probability Assessments: Regularized Mixtures of Predictive Densities for Eurozone Inflation and Real Interest Rates,” Journal of Econometrics,  237, 105321.
  • Elliott (2011) Elliott, G. (2011), “Averaging and the Optimal Combination of Forecasts,” Manuscript, Department of Economics, UCSD.
  • Elliott and Timmermann (2016) Elliott, G. and A. Timmermann (2016), Economic Forecasting, Princeton University Press.
  • Engle and Kelly (2012) Engle, R.F. and B.T. Kelly (2012), “Dynamic Equicorrelation,” Journal of Business and Economic Statistics,  30, 212–228.
  • Genre et al. (2013) Genre, V., G. Kenny, A. Meyler, and A. Timmermann (2013), “Combining Expert Forecasts: Can Anything Beat the Simple Average?” International Journal of Forecasting,  29, 108–121.
  • Gourieroux et al. (1993) Gourieroux, C., A. Monfort, and E. Renault (1993), “Indirect Inference,” Journal of Applied Econometrics,  8, S85–S118.
  • Makridakis and Winkler (1983) Makridakis, S. and R.L. Winkler (1983), “Averages of Forecasts: Some Empirical Results,” Management science,  29, 987–996.
  • Smith Jr (1993) Smith Jr, A.A. (1993), “Estimating Nonlinear Time-Series Models using Simulated Vector Autoregressions,” Journal of Applied Econometrics,  8, S63–S84.
  • Stock and Watson (2016) Stock, J.H. and M.W. Watson (2016), “Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics,” In J.B. Taylor and H. Uhlig (eds.), Handbook of Macroeconomics, vol. 2A, Elsevier, 415-526.
  • Surowiecki (2005) Surowiecki, J. (2005), The Wisdom of Crowds, Vintage Books.