On the Wisdom of Crowds
(of Economists)

Francis X. Diebold
University of Pennsylvania
and NBER Aarón Mora
University of South Carolina Minchul Shin
Federal Reserve Bank of Philadelphia

Abstract: We study the properties of macroeconomic survey forecast response averages as the number of survey respondents grows. Such averages are “portfolios” of forecasts. We characterize the speed and pattern of the gains from diversification and their eventual decrease with portfolio size (the number of survey respondents) in both (1) the key real-world data-based environment of the U.S. Survey of Professional Forecasters (SPF), and (2) the theoretical model-based environment of equicorrelated forecast errors. We proceed by proposing and comparing various direct and model-based “crowd size signature plots”, which summarize the forecasting performance of $k$ -average forecasts as a function of $k$ , where $k$ is the number of forecasts in the average. We then estimate the equicorrelation model for growth and inflation forecast errors by choosing model parameters to minimize the divergence between direct and model-based signature plots. The results indicate near-perfect equicorrelation model fit for both growth and inflation, which we explicate by showing analytically that, under conditions, the direct and fitted equicorrelation model-based signature plots are identical at a particular model parameter configuration. We find that the gains from diversification are greater for inflation forecasts than for growth forecasts, but that both gains nevertheless decrease quite quickly, so that fewer SPF respondents than currently used may be adequate.

Acknowledgments: For helpful comments and discussions we thank seminar and conference participants at the University of Washington and Cornell University, the International Conference on Macroeconomic Analysis and International Finance (Rethimno, Crete), and the Conference on Real-Time Data Analysis, Methods and Applications in Macroeconomics and Finance (Bank of Canada, Ottawa). For research assistance we thank Jacob Broussard. Any remaining errors are ours alone.

Key words: Macroeconomic surveys of professional forecasters, forecast combination, model averaging, equicorrelation

JEL codes: C5, C8, E3, E6

Contact: [email protected], [email protected], [email protected]

1 Introduction and Basic Framework

The wisdom of crowds, or lack thereof, is traditionally and presently a central issue in psychology, history, and political science; see for example Surowiecki (2005) regarding wisdom, and Aliber et al. (2023) regarding lack thereof. Perhaps most prominently, however, the wisdom of crowds is also—and again, traditionally and presently—a central issue in economics and finance, where heterogeneous information and expectations formation take center stage.¹¹1Interestingly, moreover, it also features prominently in new disciplines like machine learning and artificial intelligence, via forecast combination methods like ensemble averaging (e.g., Diebold et al., 2023).

In this paper we focus on economics and finance, studying the “wisdom” of “crowds” of professional economists. We focus on the U.S. Survey of Professional Forecasters (SPF), which is important not only in facilitating empirical academic research in macroeconomics and financial economics, but also—and crucially—in guiding real-time policy, business, and investment management decisions.²²2On real-time policy and its evaluation, see John Tayor’s inaugural NBER Feldstein Lecture at https://www.hoover.org/sites/default/files/gmwg-empirically-evaluating-economic-policy-in-real-time.pdf.^,³³3For an introduction to the SPF, see the materials at https://www.philadelphiafed.org/surveys-and-data/real-time-data-research/survey-of-professional-forecasters.

In particular, we study SPF crowd behavior as crowd size grows, asking precisely the same sorts of questions of SPF “forecast portfolios” that one asks of financial asset portfolios: How quickly, and with what patterns, do diversification benefits become operative, and eventually dissipate, as portfolio size (the number of forecasters) grows, and why? Do the results differ across variables (e.g., growth vs. inflation), and if so, why? What are the implications for survey size and design? We are of course not the first to ask such questions. Classic early work on which we build includes, for example, Makridakis and Winkler (1983) and Batchelor and Dua (1995). We progress much farther, however, particularly as regards analytic characterization.

We answer the above questions using what we call “crowd size signature plots”, which summarize the forecasting performance of $k$ -average forecasts as a function of $k$ , where $k$ is the number of forecasts in the average. We examine not only direct signature plots (empirically), but also model-based signature plots (analytically, based on a simple model of forecast-error equicorrelation), after which we proceed to estimate and assess the equicorrelation model.

To fix ideas and notation, let us sketch the basic framework with some precision. Let $N$ refer to a set of forecasts with $N\times 1$ zero-mean time- $t$ error vector $e_{t}$ , $t=1,...,T$ , and let $k\leq N$ refer to a subset of forecasts. We consider $k$ -forecast averages, and we seek to characterize $k$ -forecast mean-squared forecast error ( $MSE$ ). For a particular $k$ -forecast average corresponding to group $g^{*}_{k}$ , the forecast error is just the average of the individual forecast errors, so we have

\widehat{MSE}^{*}_{T}(k)={\frac{1}{T}\sum_{t=1}^{T}\left(\frac{1}{k}\sum_{i\in g% ^{*}_{k}}e_{it}\right)^{2}}.

(1)

For any choice of $k$ , however, there are $N\choose k$ possible $k$ -forecast averages. We focus on the $k$ -average $\widehat{MSE}^{*}_{T}(k)$ given by equation (1), averaged across all groups of size $k$ ,

\widehat{MSE}^{avg}_{NT}(k)=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}% }\left(\frac{1}{T}\sum_{t=1}^{T}\left(\frac{1}{k}\sum_{i\in g_{k}}e_{it}\right% )^{2}\right),

(2)

where $g_{k}$ is an arbitrary member of the set of groups of size $k$ .

Among other things, we are interested in:

(a)

Tracking and visualizing $\widehat{MSE}^{avg}_{NT}(k)$ as $k$ grows (“ $\widehat{MSE}^{avg}_{NT}(k)$ crowd size signature plots”);

(b)

Tracking and visualizing the change (improvement) in $\widehat{MSE}^{avg}_{NT}(k)$ from adding one more forecast to the pool (i.e., moving from $k$ to $k+1$ forecasts),

\widehat{DMSE}^{avg}_{NT}(k)=\widehat{MSE}^{avg}_{NT}(k)-\widehat{MSE}^{avg}_{% NT}(k+1),

(3)

as $k$ grows (“ $\widehat{DMSE}^{avg}_{NT}(k)$ crowd size signature plots”);

(c)

Tracking and visualizing the average performance from $k$ -averaging ( $\widehat{MSE}^{avg}_{NT}(k)$ ) relative to the performance from no averaging,

\widehat{R}^{avg}_{NT}(k)=\frac{\widehat{MSE}^{avg}_{NT}(k)}{\widehat{MSE}^{% avg}_{NT}(1)}

(4)

as $k$ grows (“ $\widehat{R}^{avg}_{NT}(k)$ crowd size signature plots”, where we use “ $R$ ” to denote “ratio”);

(d)

Tracking and visualizing not just mean squared-error performance as $k$ grows, as in all of the above signature plots, but rather the complete distributional squared-error performance as $k$ grows (“ $\widehat{f}^{avg}_{NT}(k)$ crowd size signature plots”, where $\widehat{f}^{avg}_{NT}(k)$ is the average empirical distribution of $e_{it}^{2}$ for crowd size $k$ );
(e)

Understanding paths and patterns in the above signature plots as $k$ grows, whether obtained by direct analysis of the SPF data, or by analysis of (equicorrelation) models fit to the SPF data;
(f)

Assessing the equicorrelation model by comparing direct and model-based SPF signature plot estimates;
(g)

Understanding similarities and differences in results across variables (growth vs. inflation);
(h)

Drawing practical implications for SPF design.

We proceed as follows. In section 2 we study the SPF, and we estimate its crowd size signature plots directly. In section 3 we study an equicorrelation model, and we characterize its crowd size signature plots analytically for any parameter configuration. In section 4 we estimate and assess the equicorrelation model by choosing its parameters to minimize divergence between direct and model-based signature plots. In section 5 we conclude and sketch several directions for future research.

2 Direct Crowd Size Signature Plots

Refer to caption — Figure 1: SPF Participation

The U.S. Survey of Professional Forecasters is a quarterly survey covering several U.S. macroeconomic variables. It was started in 1968Q4 and is currently conducted and maintained by the Federal Reserve Bank of Philadelphia.⁴⁴4For a recent introduction see Croushore and Stark (2019). In Figure 1 we show the evolution of the number of forecast participants, which declined until 1990Q2, when the Federal Reserve Bank of Philadelphia took control of the survey, after which it has had approximately 40 participants. Participants stayed for 15 quarters on average, with a minimum of 1 quarter and a maximum of 125 quarters.

We will analyze SPF point forecasts for real output growth (“growth”) and GDP deflator inflation (“inflation”), for forecast horizons $h=1,2,3,{\rm and~{}}4$ , corresponding to short-, medium-, and longer-term forecasts.⁵⁵5The SPF contains quarterly level forecasts of real GDP and the GDP implicit price deflator. We transform the level forecasts into growth and inflation forecasts by computing annualized quarter-on-quarter growth rates, and we compute the corresponding forecast errors using realized values as of December 2023. See Appendix A for details. Our sample period is 1968Q4-2023Q2, during which the survey panel had 38 participants per survey on average.

2.1 SPF Forecast Errors

Because individual forecast errors drive our analysis, as per equation (2), we begin by examining their evolving period-by-period cross-sectional distributions. In Figure 2 we show the time-series of cross-sectional means and standard deviations. Several features are apparent:

(a)

The Great Moderation is clearly reflected in both the growth and inflation error distributions, which have noticeably reduced variability from the end of the Volcker Recession to the start of the Great Recession.
(b)

Growth tends to be over-predicted when entering recessions; that is, the mean growth error (actual minus predicted) distribution tends to be negative. Hence recessions tend to catch forecasters by surprise. The Pandemic Recession provides the most extreme example, as the mean growth error plunges.
(c)

Growth is, however, sometimes systematically under-predicted during recoveries. The Pandemic Recession again provides the most extreme example, as the mean growth error leaps skyward.
(d)

Inflation shows little such systematic over- or under-prediction when entering or exiting recessions, except for the entry into the Oil Shock Recession, when inflation was noticeably under-predicted.
(e)

The variability of the growth error distribution increases during recessions, most notably during the Great Recession and the Pandemic Recession, reflecting greater disagreement among forecasters. The same is true of the inflation error distribution during those two recessions.
(f)

The unusual behavior of the inflation error distribution following the Pandemic Recession is clearly revealed. The mean error is always positive there (i.e., forecasters tended to under-predict), with the amount of under-prediction first growing and then shrinking. The inflation error variability follows a similar path.

2.2 Directly-Estimated Crowd Size Signature Plots

In principle we want to compute $\widehat{MSE}^{avg}_{NT}(k)$ , but that is impossible in practice unless $N$ is very small, due to the potentially huge number of different $k$ -average forecasts.⁶⁶6For example, for $N=40$ , which is a realistic value for surveys of forecasters, and $k=20$ , we obtain ${N\choose k}=1.4{\times}10^{11}$ . Hence we proceed by approximating $\widehat{MSE}^{avg}_{NT}(k)$ as follows:

(a)

Randomly select a $k$ -average forecast $g^{*}_{k}$ , and calculate $\widehat{MSE}^{*}_{T}(k)$ .
(b)

Repeat $B$ times, and average the $\widehat{MSE}^{*}_{T}(k)$ values across the $B$ draws, where $B$ is large, but not so large as to be computationally intractable.⁷⁷7In this paper we use $B=30,000$ .

We show direct crowd size signature plots for growth and inflation, for horizons $h=1,2,3,4$ , in Figure 3 ( $\widehat{MSE}^{avg}_{NT}(k)$ plots) and Figure 4 ( $\widehat{DMSE}^{avg}_{NT}(k)$ plots). Several features are apparent:

(a)

For both growth and inflation, the $\widehat{MSE}^{avg}_{NT}(k)$ signature plot is lowest for $h=1$ , with the signature plots for $h$ = 2, 3, and 4 progressively farther above the $h=1$ plot in roughly parallel upward shifts, reflecting the fact that the near future is generally easier to predict than the more-distant future.
(b)

The growth $\widehat{MSE}^{avg}_{NT}(k)$ signature plot does not rise much from $h=3$ to $h=4$ , in contrast to the inflation $\widehat{MSE}^{avg}_{NT}(k)$ signature plot, suggesting that growth predictability drops with horizon more quickly than inflation predictability, effectively vanishing by $h=3$ .
(c)

For both growth and inflation and all forecast horizons, the reduction in $\widehat{MSE}^{avg}_{NT}(k)$ from $k=1$ to $k=5$ dwarfs the improvement from moving from $k=6$ to $k=20$ , as visually emphasized by the $\widehat{DMSE}^{avg}_{NT}(k)$ signature plots in Figure 4. Hence there is little benefit from adding representative forecasters to the pool beyond $k=5$ .
(d)

For both growth and inflation and all forecast horizons, the $\widehat{DMSE}^{avg}_{NT}(k)$ signature plots are approximately the same (i.e., no upward shifts), which is expected because the $\widehat{MSE}^{avg}_{NT}(k)$ signature plots shift with horizon in approximately parallel fashion, leaving the “first derivative” ( $\widehat{DMSE}^{avg}_{NT}(k)$ ) unchanged.

We show direct growth and inflation $\widehat{R}^{avg}_{NT}(k)$ crowd size signature plots in Figure 5. They are simply the $\widehat{MSE}^{avg}_{NT}(k)$ plots of Figure 3, scaled by $\widehat{MSE}^{avg}_{NT}(1)$ (the benchmark MSE corresponding to no averaging), so that $\widehat{R}^{avg}_{NT}(1)\equiv 1$ . The $\widehat{R}^{avg}_{NT}(k)$ plots facilitate $\widehat{MSE}^{avg}_{NT}(k)$ comparisons across growth and inflation, particularly when a common vertical scale is used, as in Figure 5. It is immediately apparent that the growth vs inflation $\widehat{R}^{avg}_{NT}(k)$ plots asymptote to very different levels as $k$ increases – approximately 80% for growth and 60% for inflation – which highlights an important result not discussed thus far: The benefits of SPF “portfolio diversification” appear substantially greater for inflation than for growth, presumably due to lower correlation among the inflation forecasts. We will return to this issue when we study and estimate models of equicorrelated forecast errors in sections 3 and 4 below.

Finally, we show growth and inflation $\widehat{f}^{avg}_{NT}(k)$ crowd size signature plots in Figure 6. In particular, we show boxplots of squared 1-step-ahead $k$ -average forecast errors for $k=1,...,20$ .⁸⁸8The boxplots display the median, the first and third quartiles, the lower extreme value (first quartile minus 1.5 times interquartile range), the upper extreme value (third quartile plus 1.5 times the interquartile range), and outliers. Both the growth and inflation forecast error distributions are highly right-skewed for small k, but they become less variable and more symmetric as k grows and the central limit theorem (CLT) becomes operative, which happens noticeably less quickly for growth than for inflation. It is interesting to note, moreover, that for both growth and inflation the worst-case (maximum) $MSE$ is dropping in $k$ (“bad luck” resulting in high $MSE$ happens easily for small $k$ but is reduced as $k$ increases and the CLT becomes operative), but best-case (minimum) $MSE$ is increasing in $k$ (“good luck” resulting in low $MSE$ happens easily for small $k$ but is reduced as $k$ increases and the CLT becomes operative).

3 Model-Based Crowd Size Signature Plots

Having empirically characterized crowd size signature plots directly in the SPF data, we now proceed to characterize them analytically in a simple covariance-stationary equicorrelation model, in which $e_{t}\sim(0,\Sigma)$ , where $0$ is the $N\times 1$ zero vector and $\Sigma$ is an $N\times N$ forecast-error covariance matrix displaying equicorrelation, by which we mean that all variances are identical and all implied correlations are identical.

3.1 Equicorrelated Forecast Errors

A trivial equicorrelation example occurs when $\Sigma=\sigma^{2}I$ , where $I$ denotes the $N\times N$ identity matrix, so that all variances are equal ( $\sigma^{2}$ ), and all correlations are equal (0). Of course the zero-correlation case is unrealistic, because, for example, economic forecast errors are invariably positively correlated due to overlap of information sets, but it will serve as a useful benchmark, so we begin with it.

Simple averaging is the fully optimal forecast combination in the zero-correlation environment, which is obvious since the forecasts are exchangeable. More formally, the optimality of simple averaging (equal combining weights) follows from the multivariate Bates and Granger (1969) formula for $MSE$ -optimal combining weights,

\lambda^{*}=\left(\iota^{\prime}\Sigma^{-1}\iota\right)^{-1}\Sigma^{-1}\iota,

(5)

where $\iota$ is a $k$ -dimensional column vector of ones. For $\Sigma=\sigma^{2}I$ the optimal weights collapse to

\lambda^{*}=(\sigma^{-2}N)^{-1}\sigma^{-2}\iota=\frac{1}{N}\iota.

Analytical results for $MSE^{avg}_{NT}(k;\sigma)$ are straightforward for simple averages in the zero-correlation environment. Immediately, for $T$ sufficiently large, we have

MSE^{avg}_{NT}(k;\sigma)=plim_{T\rightarrow\infty}\left(\frac{1}{{N\choose k}}% \sum_{g_{k}=1}^{{N\choose k}}\left(\frac{1}{T}\sum_{t=1}^{T}\left(\frac{1}{k}% \sum_{i\in g_{k}}e_{it}\right)^{2}\right)\right)

(6)

=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\left(E\left[\left(\frac{1% }{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right]\right)

=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}E\left[% \left(\sum_{i\in g_{k}}e_{it}\right)^{2}\right]

=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}\sum_{i\in g% _{k}}E\left[e_{it}^{2}\right]

=\frac{\sigma^{2}}{k}.

Moreover,

DMSE^{avg}_{NT}(k;\sigma)=\frac{\sigma^{2}}{k(k+1)}

(7)

and

R^{avg}_{NT}(k)=\frac{1}{k}.

(8)

(Notice that $\sigma$ cancels in the $R^{avg}_{NT}(k;\sigma)$ calculation, so we simply write $R^{avg}_{NT}(k)$ .)

We now move to a richer equicorrelation case with equal but nonzero correlations, but still with equal variances (we refer to it as “strong equicorrelation”, or simply “equicorrelation” when the meaning is clear from context), so that instead of $\Sigma=\sigma^{2}I$ we have

\Sigma=\sigma^{2}R,

(9)

where

R=\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\\ \end{pmatrix},

(10)

and $\rho\in\left]\frac{-1}{N-1},1\right[$ .⁹⁹9 $R$ is positive definite if and only if $\rho\in\left]\frac{-1}{N-1},1\right[$ . See Lemma 2.1 of Engle and Kelly (2012). Recent work, in particular Engle and Kelly (2012), has made use of equicorrelation in the context of modeling multivariate financial asset return volatility.

Importantly, the optimality of simple averaging under zero correlation is preserved under equicorrelation.¹⁰¹⁰10That is, equicorrelation is sufficient for the optimality of simple averaging. Elliott (2011) shows that a necessary and sufficient condition for optimality of simple averaging is that row sums of $\Sigma$ be equal. Equicorrelation is one such case, although there are of course others, obtained by manipulating correlations in their relation to variances to keep row sums equal, but none are nearly so compelling and readily interpretable as equicorrelation. To see why, consider the inverse covariance matrix in the expression for the optimal combining weight vector, (5). In the equicorrelation case we have

\Sigma^{-1}=\frac{1}{\sigma^{2}}R^{-1},

(11)

where¹¹¹¹11See Lemma 2.1 of Engle and Kelly (2012).

R^{-1}=\frac{1}{1-\rho}I-\frac{\rho}{(1-\rho)(1+(N-1)\rho)}\iota\iota^{\prime}.

(12)

Then, using equation (12), the first part of the optimal combining weight (5) is

\iota^{\prime}\Sigma^{-1}\iota=\frac{N}{\sigma^{2}}\frac{(1+(N-1)\rho)-\rho N}% {(1-\rho)(1+(N-1)\rho)},

(13)

and the second part is

\Sigma^{-1}\iota=\frac{1}{\sigma^{2}}\frac{(1+(N-1)\rho)-\rho N}{(1-\rho)(1+(N% -1)\rho)}\iota.

(14)

Inserting equations (13) and (14) into equation (5) yields

\lambda^{*}=\frac{1}{N}\iota,

(15)

establishing the optimality of equal weights.

Having now introduced equicorrelation and shown that it implies optimality of simple average forecast combinations, it is of interest to assess whether it is a potentially reasonable model for sets of survey forecast errors. The answer is yes. First, obviously but importantly, the information sets of economic forecasters are quite highly overlapping, so it is not an unreasonable approximation to suppose that various pairs of forecast errors will be positively and similarly correlated.

Second, less obviously but also importantly, equicorrelation is closely linked to factor structure, which is a great workhorse of modern macroeconomics and business-cycle analysis (e.g., Stock and Watson, 2016). In particular, equicorrelation arises when forecast errors have single-factor structure with equal factor loadings and equal idiosyncratic shock variances, as in:

e_{it}=\delta z_{t}+w_{it}

(16)

z_{t}=\phi z_{t-1}+v_{t},

where $w_{it}\sim iid(0,\sigma_{w}^{2})$ , $v_{t}\sim iid(0,\sigma_{v}^{2})$ , and $w_{it}\perp v_{t}$ , $\forall i,t$ , $i=1,...,N$ , $t=1,...,T$ . In Appendix B we also explore a less-restrictive form of factor structure that produces a less-restrictive form of equicorrelation (“weak equicorrelation”).

Finally, a large literature from the 1980s onward documents the routine outstanding empirical performance of simple average forecast combinations, despite the fact that simple averages are not optimal in general (e.g., Clemen, 1989; Genre et al., 2013; Elliott and Timmermann, 2016; Diebold and Shin, 2019). As we have seen, however, equicorrelation is sufficient (and almost necessary) for optimality of simple averages, so that if simple averages routinely perform well, then the equicorrelation model is routinely reasonable – and the natural model to pair with the simple averages embodied in the SPF.

3.2 Analytic Equicorrelation Crowd Size Signature Plots

Analytical results for $MSE^{avg}_{NT}(\cdot)$ , $DMSE^{avg}_{NT}(\cdot)$ and $R^{avg}_{NT}(\cdot)$ are easy to obtain under equicorrelation, just as they were under zero correlation. Immediately, for $T$ sufficiently large,

MSE^{avg}_{NT}(k;\rho,\sigma)=plim_{T\rightarrow\infty}\left(\frac{1}{{N% \choose k}}\sum_{g_{k}=1}^{{N\choose k}}\left(\frac{1}{T}\sum_{t=1}^{T}\left(% \frac{1}{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right)\right)

(17)

=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\left(E\left[\left(\frac{1% }{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right]\right)

=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}E\left[% \left(\sum_{i\in g_{k}}e_{it}\right)^{2}\right]

=\frac{1}{{N\choose k}}\sum_{g_{k}=1}^{{N\choose k}}\frac{1}{k^{2}}\left[k% \sigma^{2}+k(k-1)Cov(e_{it},e_{jt})\right]\quad\text{for }\>i\neq j

=\frac{\sigma^{2}}{k}\left[1+(k-1)\rho\right].

Moreover,

DMSE^{avg}_{NT}(k;\rho,\sigma)=\frac{\sigma^{2}}{k(k+1)}(1-\rho)

(18)

and

R^{avg}_{NT}(k;\rho)=\frac{1}{k}\left[1+(k-1)\rho\right].

(19)

(Note that $\sigma$ cancels in the $R^{avg}_{NT}(k;\rho,\sigma)$ calculation, so we simply write $R^{avg}_{NT}(k;\rho)$ .) If $\rho=0$ , the result (17) for $MSE^{avg}_{NT}(k;\rho,\sigma)$ in the equicorrelation case of course collapses to the earlier result (6) for $MSE^{avg}_{NT}(k;\sigma)$ in the zero-correlation case.

In Figure 7 we show $MSE^{avg}_{NT}(k;\rho,\sigma)$ as a function of $k$ , for various equicorrelations, $\rho$ , with $\sigma=1$ . From equation (17), the height of each curve at $k=1$ is simply $\sigma=1$ , and the curves decrease for any fixed $\rho$ to a limiting value ( $\rho$ ) as the combining pool grows ( $k\rightarrow\infty$ ).¹²¹²12Indeed under equicorrelation with $\sigma=1$ , as here, $MSE^{avg}_{NT}(k;\rho,\sigma)=R^{avg}_{NT}(k;\rho,\sigma)$ . Indeed the gains from increasing $k$ are initially large but decrease quickly. The $MSE$ improvement, for example, in moving from $k=1$ to $k=5$ consistently dwarfs that of moving from $k=5$ to $k=20$ .

Overall, then, the value of increasing the pool size (i.e., increasing $k$ ) is highest when $k$ is small (small pool), when $\rho$ is low (weakly-correlated forecast errors), or when $\sigma^{2}$ is high (volatile forecast errors). In particular, for realistic values of $\rho$ , around 0.5, say, most gains from increasing $k$ are obtained by $k=5$ .

Several additional remarks are in order:

(a)

The fact that, for realistic values of $\rho$ , most $MSE^{avg}_{NT}(k;\rho,1)$ gains from increasing $k$ are obtained by $k=5$ does not necessarily indicate that typical surveys use too many forecasters. $MSE^{avg}_{NT}(k;\rho,1)$ is an average across all $k$ -forecast combinations, and the best and worst $k$ -average combinations, for example, will have very different MSEs. Figure 8 speaks to this; it shows $MSE^{min}_{NT}(k;\rho,\sigma)$ , $MSE^{avg}_{NT}(k;\rho,\sigma)$ , and $MSE^{max}_{NT}(k;\rho,\sigma)$ ) under equicorrelation with $\rho=0.5$ and $\sigma=1$ , for $k=1,...,20$ .¹³¹³13We use $N=40$ as an approximation to the average number of forecasters participating in the SPF in any given quarter, and we use $T=160$ to mimic the total sample size when working with 40 years of quarterly data, as in the SPF.
(b)

The equicorrelation case is the only one for which analytic results are readily obtainable. For example, even if we maintain the assumption of equal correlations but simply allow different forecast error variances (“weak equicorrelation”), the $MSE$ of the $k$ -person average forecast becomes a function of $(k,\rho,\sigma_{1}^{2},...,\sigma_{N}^{2})$ , and little more can be said.¹⁴¹⁴14See Appendix C for derivation of optimal combining weights in the weak equicorrelation case.
(c)

As mentioned earlier, the equicorrelation case naturally matches the provision of survey averages, because in that case simple averages are optimal. Hence, as we now proceed to a model-based empirical analysis of real forecasters, we work with the equicorrelation model, asking what values of $\rho$ and $\sigma$ make the equicorrelation model-based $MSE^{avg}_{NT}(k;\rho,\sigma)$ signature plot as close as possible to the direct $\widehat{MSE}^{avg}_{NT}(k)$ signature plot.

4 Estimating the Equicorrelation Model

Here we estimate the equicorrelation model by choosing its parameters $\rho$ and $\sigma$ to make the equicorrelation $MSE^{avg}_{NT}(k;\rho,\sigma)$ as close as possible to the SPF $\widehat{MSE}^{avg}_{NT}(k)$ . This estimation strategy is closely related to, but different from, GMM estimation. Rather than matching model and data moments, it matches more interesting and interpretable functions of those moments, namely model-based and direct crowd size signature plots – as per the “indirect inference” of Smith Jr (1993) and Gourieroux et al. (1993). Henceforth we refer to it simply as the “matching estimator”.

Specifically, we solve for $(\hat{\rho},\hat{\sigma})$ such that

(\hat{\rho},\hat{\sigma})=\arg\min_{(\rho,\sigma)}Q(\rho,\sigma),

(20)

where

Q(\rho,\sigma)=\frac{1}{N}\sum_{k=1}^{N}\left(\widehat{MSE}^{avg}_{NT}(k)-MSE_% {NT}^{avg}(k;\rho,\sigma)\right)^{2},

and the minimization is constrained such that $\sigma>0$ and $\rho\in\left]\frac{-1}{N-1},1\right[$ .

4.1 Calculating the Solution

Calculating the minimum in equation (20) is very simple, because the bivariate minimization can be reduced to a univariate mimimization in $\rho\in\left]\frac{-1}{N-1},1\right[$ . To see this, recall that $MSE_{NT}^{avg}(k;\rho,\sigma)=\frac{\sigma^{2}}{k}[1+(k-1)\rho]$ , so that the first-order condition for $\sigma^{2}$ is

\frac{1}{N}\sum_{k=1}^{N}\left(\widehat{MSE}^{avg}_{NT}(k)-MSE_{NT}^{avg}(k;% \rho,\sigma)\right)\left(\frac{1}{k}+\frac{k-1}{k}\rho\right)=0,

(21)

and the first-order condition for $\rho$ is

\frac{1}{N}\sum_{k=1}^{N}\left(\widehat{MSE}^{avg}_{NT}(k)-MSE_{NT}^{avg}(k;% \rho,\sigma)\right)\left(1-\frac{1}{k}\right)\sigma^{2}=0.

(22)

Combining equations (21) and (22) yields

\sigma^{2}=\frac{c_{1}}{c_{2}+c_{3}\rho},

(23)

where $c_{1}=\sum_{k=1}^{N}\frac{\widehat{MSE}^{avg}_{NT}(k)}{k}$ , $c_{2}=\sum_{k=1}^{N}\frac{1}{k^{2}}$ and $c_{3}=\sum_{k=1}^{N}\frac{k-1}{k^{2}}$ . Hence, at the optimum and conditional on the data, there is a deterministic inverse relationship between $\rho$ and $\sigma$ (see Figure 9), enabling one to restrict the parameter search to the small open interval $\rho\in\left]\frac{-1}{N-1},1\right[$ , as well as to explore the objective function visually as a function of $\rho$ alone (see Figure 10).

Table 1: Equicorrelation Model Estimates

Growth
	$h=1$	$h=2$	$h=3$	$h=4$
$\hat{\sigma}^{2}$	18.562	21.170	22.713	23.275
	(0.606)	(0.546)	(0.589)	(0.646)
$\hat{\rho}$	0.801	0.843	0.842	0.831
	(0.036)	(0.028)	(0.028)	(0.030)
$\widehat{R}^{avg}_{NT}(1;\hat{\rho})$	1.000	1.000	1.000	1.000
$\widehat{R}^{avg}_{NT}(5;\hat{\rho})$	0.841	0.874	0.874	0.865
$\widehat{R}^{avg}_{NT}(15;\hat{\rho})$	0.815	0.853	0.853	0.842
$Q(\hat{\sigma}^{2},\hat{\rho})$	5.99E-05	3.95E-06	7.95E-06	1.01E-05
Inflation
	$h=1$	$h=2$	$h=3$	$h=4$
$\hat{\sigma}^{2}$	3.662	4.343	5.123	6.094
	(0.253)	(0.255)	(0.295)	(0.370)
$\hat{\rho}$	0.580	0.644	0.650	0.630
	(0.082)	(0.068)	(0.066)	(0.071)
$\widehat{R}^{avg}_{NT}(1;\hat{\rho})$	1.000	1.000	1.000	1.000
$\widehat{R}^{avg}_{NT}(5;\hat{\rho})$	0.664	0.715	0.720	0.704
$\widehat{R}^{avg}_{NT}(15;\hat{\rho})$	0.608	0.668	0.673	0.655
$Q(\hat{\sigma}^{2},\hat{\rho})$	2.21E-06	9.56E-07	1.82E-06	4.36E-05

{adjustwidth}

1cm1cm Notes: We show equicorrelation model parameter estimates for SPF growth and inflation forecast errors at various horizons, with standard errors computed via 1000 bootstrap samples. We also show estimated relative $MSE$ with respect to no averaging, $\widehat{R}^{avg}_{NT}(k;\hat{\rho})$ for $k=1,5,15$ . In the final line of each panel we show the value of the objective function evaluated at the estimated parameters, $Q(\hat{\sigma}^{2},\hat{\rho})$ .

In Table 1 we show the complete set of estimates (for ${\sigma}^{2}$ and ${\rho}$ , for growth and inflation, for $h=1,...4$ ). $\hat{\sigma}^{2}$ increases with forecast horizon, reflecting the fact that the distant future is harder to forecast than the near future, and implying that the fitted equicorrelation $MSE$ signature plots, $\widehat{MSE}^{avg}_{NT}(k;\hat{\rho},\hat{\sigma})$ , should shift upward with horizon, as confirmed in Figure 11. Comparison of the direct $\widehat{MSE}^{avg}_{NT}(k)$ signature plots in Figure 3 with the equicorrelation model-based $\widehat{MSE}^{avg}_{NT}(k;\hat{\rho},\hat{\sigma})$ signature plots in Figure 11 reveals a remarkably good equicorrelation model fit. We emphasize this in Figure 12, in which we show side-by-side direct $\widehat{R}^{avg}_{NT}(k)$ (left column) and equicorrelation model-based $\widehat{R}^{avg}_{NT}(k;\hat{\rho})$ (right column) signature plots.

4.2 Understanding the Near-Perfect Equicorrelation Fit

Here we present a closed-form solution for the direct crowd size signature plot. The result is significant in its own right and reveals why our numerical matching estimates for the equicorrelation model produce fitted signature plots that align so closely with direct signature plots. To maintain precision it will prove useful to state it as a formal theorem.

Theorem: Let $e_{t}$ be any covariance stationary $N\times 1$ vector with mean zero and covariance matrix $\Sigma$ , given by

\mathbf{\Sigma}=\begin{pmatrix}\sigma_{1}^{2}&c_{12}&c_{13}&\cdots&c_{1N}\\ c_{21}&\sigma_{2}^{2}&c_{23}&\cdots&c_{2N}\\ c_{31}&c_{32}&\sigma_{3}^{2}&\cdots&c_{3N}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ c_{N1}&c_{N2}&c_{N3}&\cdots&\sigma_{N}^{2}\end{pmatrix},

and define the $k$ -average $MSE$ ,

MSE^{avg}(k)=\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}E\left[\left(% \frac{1}{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right],

where $g_{k}$ represents any subset of $e_{t}$ of size $k$ ( $k\in[1,N]$ ). Then

MSE^{avg}(k)=\frac{\overline{\sigma^{2}}}{k}\left(1+(k-1)\overline{\rho}\right)

(24)

and

R^{avg}(k)=\frac{MSE^{avg}(k)}{MSE^{avg}(1)}=\frac{1}{k}\left(1+(k-1)\overline% {\rho}\right),

(25)

where

\overline{\sigma^{2}}=\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{2}

\overline{c}=\frac{1}{\binom{N}{2}}\sum_{1\leq i<j\leq N}c_{ij}

\overline{\rho}=\frac{\overline{c}}{\overline{\sigma^{2}}}.

Proof: We have:

MSE^{avg}(k)=\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}E\left[\left(% \frac{1}{k}\sum_{i\in g_{k}}e_{it}\right)^{2}\right]

=\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}\left[\sum_% {i\in g_{k}}\sigma_{i}^{2}+\sum_{i,j\in g_{k},i\neq j}c_{ij}\right]

=\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\left[\underbrace{\sum_{g_{k}=1}^{\binom% {N}{k}}\sum_{i\in g_{k}}\sigma_{i}^{2}}_{\text{grand sum of variances}}+% \underbrace{\sum_{g_{k}=1}^{\binom{N}{k}}\sum_{i,j\in g_{k},i\neq j}c_{ij}}_{% \text{grand sum of covariances}}\right].

First consider the term related to variance. Note that, of the $\binom{N}{k}$ groups, there are $\binom{N-1}{k-1}$ groups that include $e_{i}^{2}$ . The sum of $\sum_{i\in g_{k}}\sigma_{i}^{2}$ over all groups $g_{k}$ is therefore

\sum_{g_{k}=1}^{\binom{N}{k}}\sum_{i\in g_{k}}\sigma_{i}^{2}=\binom{N-1}{k-1}% \sum_{i=1}^{N}\sigma_{i}^{2}.

(26)

Now consider the term related to covariance. When summing $\sum_{i,j\in g_{k},i\neq j}c_{ij}$ across all groups $g_{k}$ , covariances between all possible pairs of $e_{it}$ and $e_{jt}$ are accounted for. Because we are summing over all arbitrary groups $g_{k}$ , each pair $(i,j)$ appears the same number of times in the grand summation. To compute this number we observe that each group $g_{k}$ contains $k(k-1)$ pairwise covariances and that there are $\binom{N}{k}$ possible groups. Hence the total number of individual covariance terms in the grand sum is $k(k-1)\binom{N}{k}$ . The number of times that each individual covariance term $c_{ij}$ appears in the grand sum is $\frac{k(k-1)\binom{N}{k}}{\binom{N}{2}}$ , where $\binom{N}{2}$ is the total number of distinct pairs $(i,j)$ . The grand sum of covariances is therefore

\frac{1}{\binom{N}{k}}\sum_{g_{k}=1}^{\binom{N}{k}}\sum_{i,j\in g_{k},i\neq j}% c_{ij}=\frac{k(k-1)\binom{N}{k}}{\binom{N}{2}}\sum_{i\leq i<j\leq N}c_{ij}.

(27)

Combining results (26) and (27), we have:

MSE^{avg}(k)=\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\binom{N-1}{k-1}\sum_{i=1}^{% N}\sigma_{i}^{2}+\frac{1}{k^{2}}\frac{1}{\binom{N}{k}}\frac{k(k-1)\binom{N}{k}% }{\binom{N}{2}}\sum_{i\leq i<j\leq N}c_{ij}

=\frac{1}{k}\left[\left(\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{2}\right)+(k-1)% \left(\frac{1}{\binom{N}{2}}\sum_{i\leq i<j\leq N}c_{ij}\right)\right]

=\frac{\overline{\sigma^{2}}}{k}\left[1+(k-1)\overline{\rho}\right].

This completes the proof. $\square$

Several remarks are in order:

(a)

Equation (24) reveals that the direct crowd size signature plot is simply the equicorrelation model-based signature plot evaluated at particular values of the equicorrelation model parameters. This is true despite the fact that (24) does not require the forecast errors to be truly equicorrelated. Hence the “best-matching” equicorrelation model-based signature plot will always match the direct plot perfectly, regardless of whether the forecast errors are truly equicorrelated.

(b)

Equation (24) suggests an alternative, closed-form, matching estimator for the equicorrelation model:

\widehat{\sigma}^{2}=\frac{1}{N}\sum_{i=1}^{N}\widehat{\sigma}_{i}^{2}

(28)

\widehat{\rho}=\frac{\frac{1}{\binom{N}{2}}\sum\limits_{1\leq i<j\leq N}% \widehat{c}_{ij}}{\frac{1}{N}\sum\limits_{i=1}^{N}\widehat{\sigma}_{i}^{2}}.

(29)

(c)

Assessment of whether the forecast errors are truly equicorrelated could be done (under much stronger assumptions) by maximum-likelihood estimation of a dynamic-factor model, followed by likelihood-ratio tests of the restrictions implied by equicorrelation, as sketched in Appendix B for both weak and strong equicorrelation.

5 Summary, Conclusions, and Directions for Future Research

We have studied the properties of macroeconomic survey forecast response averages as the number of survey respondents grows, characterizing the speed and pattern of the “gains from diversification” and their eventual decrease with “portfolio size” (the number of survey respondents) in both (1) the key real-world data-based environment of the U.S. Survey of Professional Forecasters (SPF), and (2) the theoretical model-based environment of equicorrelated forecast errors. We proceeded by proposing and comparing various direct and model-based “crowd size signature plots”, which summarize the forecasting performance of $k$ -average forecasts as a function of $k$ , where $k$ is the number of forecasts in the average. We then estimated the equicorrelation model for growth and inflation forecast errors by choosing model parameters to minimize the divergence between direct and model-based signature plots.

The results indicate near-perfect equicorrelation model fit for both growth and inflation, which we explicated by showing analytically that, under conditions, the direct and fitted equicorrelation model-based signature plots are identical at a particular model parameter configuration, which we characterize. We find that the gains from diversification are greater for inflation forecasts than for growth forecasts, but that both the inflation and growth diversification gains nevertheless decrease quite quickly, so that fewer SPF respondents than currently used may be adequate.

Several directions for future research appear promising, including, in no particular order:

(a)

Instead of considering $MSE$ s across $N\choose k$ possible $k$ -average forecasts and averaging to obtain a “representative $k$ -average” forecast $MSE$ as a function of $k$ , one may want to consider “best $k$ -average” forecast $MSE$ as a function of $k$ , where the unique best $k$ -average forecast is obtained in each period as the $k$ -average that performed best historically.

(b)

One may want to allow for time-varying equicorrelation parameters, as $\sigma^{2}$ might, for example, move downward with the Great Moderation, while $\rho$ might move counter-cyclically. The strong equicorrelation model in dynamic-factor form becomes

e_{it}=\delta_{t}z_{t}+w_{it}

z_{t}=\phi z_{t-1}+v_{t},

where $w_{it}\sim iid(0,\sigma_{wt}^{2})$ , $v_{t}\sim iid(0,\sigma_{v}^{2})$ , and $w_{it}\perp v_{t},\forall i,t$ . Immediately,

e_{t}\sim iid\left(0,~{}\Sigma_{t}(\rho_{t})\right),

where

\Sigma_{t}(\rho_{t})~{}=~{}\sigma_{t}^{2}\begin{pmatrix}1&\rho_{t}&\cdots&\rho% _{t}\\ \rho_{t}&1&\cdots&\rho_{t}\\ \vdots&\vdots&\ddots&\vdots\\ \rho_{t}&\rho_{t}&\cdots&1\end{pmatrix}

\sigma_{t}^{2}=\delta_{t}^{2}var(z_{t})+\sigma_{wt}^{2}

\rho_{t}=\frac{\delta_{t}^{2}var(z_{t})}{\delta_{t}^{2}var(z_{t})+\sigma_{wt}^% {2}}.

(c)

One may want to complement our exploration of the U.S. SPF with a comparative exploration of the European SPF.¹⁵¹⁵15For an introduction to the European SPF, see the materials at https://data.ecb.europa.eu/methodology/survey-professional-forecasters-spf. Doing so appears feasible but non-trivial, due to cross-survey differences in sample periods, economic indicator concepts (e.g., inflation), and timing conventions, and we reserve it for future work.

\appendixpage\addappheadtotoc

Appendix A Data Definitions and Sources

We obtain U.S. quarterly level forecasts of real output and the GDP deflator from the Federal Reserve Bank of Philadelphia’s Individual Forecasts: Survey of Professional Forecasters (variables $RGDP$ and $PGDP$ , respectively). We transform the level forecasts into annualized growth rate forecasts using:

g_{t+h|t-1}=100\left(\left(\frac{f_{t+h|t-1}}{f_{t+h-1|t-1}}\right)^{4}-1% \right),

(A1)

where $f_{t+h|t-1}$ is a quarterly level forecast (either $RGDP$ or $PGDP$ ) for quarter $t+h$ made using information available in quarter $t-1$ . For additional information, see https://www.philadelphiafed.org/surveys-and-data/real-time-data-research/individual-forecasts.

We obtain the corresponding realizations from the Federal Reserve Bank of Philadelphia’s Forecast Error Statistics for the Survey of Professional Forecasters (December 2023 vintage). The realizations are reported as annualized growth rates, as in equation (A1) above, so there is no need for additional transformation. For additional information, see https://www.philadelphiafed.org/surveys-and-data/real-time-data-research/error-statistics.

Appendix B Strong Equicorrelation, Weak Equicorrelation, and Factor Structure

Consider a standard model of dynamic single-factor structure,

e_{it}=\delta_{i}z_{t}+w_{it}

(B1)

z_{t}=\phi z_{t-1}+v_{t},

(B2)

where $w_{it}\sim iid(0,\sigma_{wi}^{2})$ , $v_{t}\sim iid(0,\sigma_{v}^{2})$ , and $w_{it}\perp v_{t}$ , $\forall i,t$ . The implied forecast error covariance matrix $\Sigma$ fails to satisfy equicorrelation; that is,

\Sigma~{}\neq~{}\sigma^{2}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\end{pmatrix},

because the forecast error variances generally vary with $i$ , and their correlations generally vary with $i$ and $j$ . In particular, simple calculations reveal that

\begin{split}\sigma^{2}_{i}\equiv var(e_{i,t})&=\delta_{i}^{2}var(z_{t})+% \sigma_{wi}^{2}\\ &=\delta_{i}^{2}\left(var(z_{t})+\frac{\sigma_{wi}^{2}}{\delta_{i}^{2}}\right)% ,~{}\forall i,\end{split}

where $var(z_{t})=\frac{\sigma_{v}^{2}}{1-\phi^{2}}$ , and

\begin{split}\rho_{ij}\equiv corr(e_{i,t},e_{j,t})&=\frac{\delta_{i}\delta_{j}% var(z_{t})}{\sqrt{\delta_{i}^{2}var(z_{t})+\sigma_{wi}^{2}}\sqrt{\delta_{j}^{2% }var(z_{t})+\sigma_{wj}^{2}}}\\ &=\frac{1}{\sqrt{1+\frac{\sigma_{wi}^{2}}{\delta_{i}^{2}var(z_{t})}}\sqrt{1+% \frac{\sigma_{wj}^{2}}{\delta_{j}^{2}var(z_{t})}}}\,,~{}\forall i,j.\end{split}

(B3)

Nevertheless, certain simple restrictions on the dynamic factor model (DFM) (B1)-(B2) produce certain forms of equicorrelation. First, from equation (B3), it is apparent that $\rho_{ij}=\rho,~{}\forall i\neq j$ if and only if

\frac{\sigma_{wi}^{2}}{\delta_{i}^{2}}=\frac{\sigma_{wj}^{2}}{\delta_{j}^{2}},% ~{}\forall i\neq j,

(B4)

so that imposition of the constraint (B4) on the measurement equation (B1) produces a “weak” form of equicorrelation with identical correlations ( $\rho$ ) but potentially different idiosyncratic shock variances ( $\sigma_{1}^{2},...,\sigma_{N}^{2}$ ). That is,

\Sigma=\begin{pmatrix}\sigma_{1}^{2}&\rho&\cdots&\rho\\ \rho&\sigma_{2}^{2}&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&\sigma_{N}^{2}\\ \end{pmatrix}.

Second, it is also apparent from equation (B3) that if we impose the stronger restriction,

\sigma_{wi}^{2}=\sigma_{wj}^{2}~{}~{}{\rm and}~{}~{}\delta_{i}=\delta_{j},~{}% \forall i,j,

(B5)

which of course implies the weaker restriction (B4), then we obtain (“strong”) equicorrelation as we have defined it throughout this paper, with identical correlations ( $\rho$ ) and idiosyncratic shock variances ( $\sigma^{2}$ ). That is,

\Sigma~{}=~{}\sigma^{2}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\end{pmatrix}.

Although we do not pursue maximum-likelihood estimation in this paper, we note that one may estimate the unrestricted DFM (B1)-(B2) by exact Gaussian pseudo maximum likelihood (ML). It is already in state-space form, and one pass of the Kalman filter yields the innovations needed for likelihood construction and evaluation, and it also accounts for missing observations associated with survey entry and exit. One may also impose weak or strong equicorrelation restrictions (B4) or (B5), respectively, and assess them using likelihood-ratio tests.

Appendix C Optimal Combining Weights Under Weak Equicorrelation

Here we briefly consider the “weak equicorrelation” case, with correlation $\rho$ and variances $\sigma_{1}^{2},~{}\sigma_{2}^{2},~{}...,~{}\sigma_{N}^{2}$ ; that is,

\Sigma=\begin{pmatrix}\sigma_{1}^{2}&\rho&\cdots&\rho\\ \rho&\sigma_{2}^{2}&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&\sigma_{N}^{2}\\ \end{pmatrix},

where $\rho\in\left]\frac{-1}{N-1},1\right[$ . We can decompose the covariance matrix $\Sigma$ as

\Sigma=DRD=\begin{pmatrix}\sigma_{1}&0&\cdots&0\\ 0&\sigma_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}\\ \end{pmatrix}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\\ \end{pmatrix}\begin{pmatrix}\sigma_{1}&0&\cdots&0\\ 0&\sigma_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}\\ \end{pmatrix},

where $R$ is positive definite if and only if $\rho\in\left]\frac{-1}{N-1},1\right[$ . The inverse of the covariance matrix is

\Sigma^{-1}=D^{-1}R^{-1}D^{-1}=\begin{pmatrix}\sigma_{1}^{-1}&0&\cdots&0\\ 0&\sigma_{2}^{-1}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}^{-1}\\ \end{pmatrix}\begin{pmatrix}1&\rho&\cdots&\rho\\ \rho&1&\cdots&\rho\\ \vdots&\vdots&\ddots&\vdots\\ \rho&\rho&\cdots&1\\ \end{pmatrix}^{-1}\begin{pmatrix}\sigma_{1}^{-1}&0&\cdots&0\\ 0&\sigma_{2}^{-1}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\sigma_{N}^{-1}\\ \end{pmatrix},

where

R^{-1}=\frac{1}{1-\rho}I-\frac{\rho}{(1-\rho)(1+(N-1)\rho)}\iota\iota^{\prime},

$I$ stands for an $N\times N$ identity matrix, and $\iota$ is a $N$ -vector of ones.

Recall that, as noted in the text, the optimal combining weight is

\lambda^{*}=\left(\iota^{\prime}\Sigma^{-1}\iota\right)^{-1}\Sigma^{-1}\iota,

(C1)

The first part of the optimal combining weight (C1) is

\iota^{\prime}\Sigma^{-1}\iota=\frac{(1+(N-1)\rho)\left(\sum_{i=1}^{N}\sigma_{% i}^{-2}\right)-\rho\left(\sum_{i=1}^{N}\sigma_{i}^{-1}\right)\left(\sum_{i=1}^% {N}\sigma_{i}^{-1}\right)}{(1-\rho)(1+(N-1)\rho)},

(C2)

and the second part is

\Sigma^{-1}\iota=\frac{(1+(N-1)\rho)\boldsymbol{\sigma}^{-2}-\rho\left(\sum_{i% =1}^{N}\sigma_{i}^{-1}\right)\boldsymbol{\sigma}^{-1}}{(1-\rho)(1+(N-1)\rho)},

(C3)

where

\boldsymbol{\sigma}^{-2}=\begin{pmatrix}\sigma_{1}^{-2}\\ \sigma_{2}^{-2}\\ \vdots\\ \ \sigma_{N}^{-2}\end{pmatrix}\quad\text{and}\quad\boldsymbol{\sigma}^{-1}=% \begin{pmatrix}\sigma_{1}^{-1}\\ \sigma_{2}^{-1}\\ \vdots\\ \ \sigma_{N}^{-1}\end{pmatrix}.

Inserting equations (C2) and (C3) into equation (C1), we get the optimal weight for the $i$ th forecast as

\lambda_{i}^{*}=\frac{\sigma_{i}^{-2}+\rho(N-2)\sigma_{i}^{-2}-\rho\left(\sum_% {j\neq i}\sigma_{i}^{-1}\sigma_{j}^{-1}\right)}{\sum_{i=1}^{N}\left(\sigma_{i}% ^{-2}+\rho(N-2)\sigma_{i}^{-2}-\rho\left(\sum_{j\neq i}\sigma_{i}^{-1}\sigma_{% j}^{-1}\right)\right)}.

(C4)

To check the formula, note that for $N=2$ we obtain the standard Bates and Granger (1969) optimal bivariate combining weight,

\lambda_{1}^{*}=\frac{\sigma_{1}^{-2}-\rho\sigma_{1}^{-1}\sigma_{2}^{-1}}{% \sigma_{1}^{-2}+\sigma_{2}^{-2}-2\rho\sigma_{1}^{-1}\sigma_{2}^{-1}}=\frac{% \sigma_{2}^{2}-\rho\sigma_{1}\sigma_{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}-2\rho% \sigma_{1}\sigma_{2}},

and for any $N$ , but with $\sigma^{2}_{j}=\sigma^{2}$ $\,\forall j$ (equicorrelation), we obtain weights,

\lambda_{i}^{*}=\frac{(1-\rho)\sigma^{-2}}{N(1-\rho)\sigma^{-2}}=\frac{1}{N},~% {}~{}\forall i.

References

(1)
Aliber et al. (2023) Aliber, R.Z., C.P. Kindleberger, and R.N. McCauley (2023), Manias, Panics, and Crashes: A History of Financial Crises, 8th Edition, Palgrave MacMillan.
Batchelor and Dua (1995) Batchelor, R. and P. Dua (1995), “Forecaster Diversity and the Benefits of Combining Forecasts,” Management Science, 41, 68–75.
Bates and Granger (1969) Bates, J.M. and C.W.J Granger (1969), “The Combination of Forecasts,” Operations Research Quarterly, 20, 451–468.
Clemen (1989) Clemen, R.T. (1989), “Combining Forecasts: A Review and Annotated Bibliography (With Discussion),” International Journal of Forecasting, 5, 559–583.
Croushore and Stark (2019) Croushore, D. and T. Stark (2019), “Fifty Years of the Survey of Professional Forecasters,” Economic Insights, 4, 1–11.
Diebold and Shin (2019) Diebold, F.X. and M. Shin (2019), “Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and its Derivatives,” International Journal of Forecasting, 35, 1679–1691.
Diebold et al. (2023) Diebold, F.X., M. Shin, and B. Zhang (2023), “On the Aggregation of Probability Assessments: Regularized Mixtures of Predictive Densities for Eurozone Inflation and Real Interest Rates,” Journal of Econometrics, 237, 105321.
Elliott (2011) Elliott, G. (2011), “Averaging and the Optimal Combination of Forecasts,” Manuscript, Department of Economics, UCSD.
Elliott and Timmermann (2016) Elliott, G. and A. Timmermann (2016), Economic Forecasting, Princeton University Press.
Engle and Kelly (2012) Engle, R.F. and B.T. Kelly (2012), “Dynamic Equicorrelation,” Journal of Business and Economic Statistics, 30, 212–228.
Genre et al. (2013) Genre, V., G. Kenny, A. Meyler, and A. Timmermann (2013), “Combining Expert Forecasts: Can Anything Beat the Simple Average?” International Journal of Forecasting, 29, 108–121.
Gourieroux et al. (1993) Gourieroux, C., A. Monfort, and E. Renault (1993), “Indirect Inference,” Journal of Applied Econometrics, 8, S85–S118.
Makridakis and Winkler (1983) Makridakis, S. and R.L. Winkler (1983), “Averages of Forecasts: Some Empirical Results,” Management science, 29, 987–996.
Smith Jr (1993) Smith Jr, A.A. (1993), “Estimating Nonlinear Time-Series Models using Simulated Vector Autoregressions,” Journal of Applied Econometrics, 8, S63–S84.
Stock and Watson (2016) Stock, J.H. and M.W. Watson (2016), “Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics,” In J.B. Taylor and H. Uhlig (eds.), Handbook of Macroeconomics, vol. 2A, Elsevier, 415-526.
Surowiecki (2005) Surowiecki, J. (2005), The Wisdom of Crowds, Vintage Books.

On the Wisdom of Crowds (of Economists)

1 Introduction and Basic Framework

2 Direct Crowd Size Signature Plots

2.1 SPF Forecast Errors

2.2 Directly-Estimated Crowd Size Signature Plots

3 Model-Based Crowd Size Signature Plots

3.1 Equicorrelated Forecast Errors

3.2 Analytic Equicorrelation Crowd Size Signature Plots

4 Estimating the Equicorrelation Model

4.1 Calculating the Solution

4.2 Understanding the Near-Perfect Equicorrelation Fit

5 Summary, Conclusions, and Directions for Future Research

Appendix A Data Definitions and Sources

Appendix B Strong Equicorrelation, Weak Equicorrelation, and Factor Structure

Appendix C Optimal Combining Weights Under Weak Equicorrelation

References

On the Wisdom of Crowds
(of Economists)