Several solutions to the problem of asset return covariance matrix estimation have been suggested in the traditional investment literature. The most common estimator of the return covariance matrix is the sample covariance matrix of historical returns.
Where T is the sample size, Ht is a Nx1 vector of hedge fund returns in period t, N is the number of assets in the portfolio, and
is the average of these return vectors. We denote by Sij the (i,j) entry of S. A problem with this estimator is typically that a covariance matrix may have too many parameters for to the available data. If the number of assets in the portfolio is N, there are indeed N(N1)/2 different covariance terms to be estimated. The problem is particularly acute in the context of alternative investment strategies, even when a limited set of funds or indices is considered, because hedge fund returns — available only on a monthly basis — provide insufficiently frequent data. One possible cure for the curse of dimensionality in covariance matrix estimation is to impose some structure on the covariance matrix to reduce the number of parameters to be estimated. In the case of asset returns, a lowdimensional linear factor structure seems natural and consistent with standard asset pricing theory, as linear multifactor models can be economically justified through equilibrium arguments (cf. Merton’s Intertemporal Capital Asset Pricing Model (1973)) or arbitrage arguments (cf. Ross’s Arbitrage Pricing Theory (1976)). Therefore, in what follows, we shall focus on Kfactor models with uncorrelated residuals. Of course, this leaves two very important questions: How much structure should be imposed? (The fewer the factors, the stronger the structure.) And what factors should be used? There is a standard tradeoff between model risk and estimation risk. The following options are available:

Impose no structure. This choice involves low specification error and high sampling error, and led to the use of the sample covariance matrix.

Impose some structure. This choice involves high specification error and low sampling error. Several models fall within this category, including the constant correlation approach (Elton and Gruber, 1973), the single factor forecast (Sharpe, 1963) and the multifactor forecast (eg, Chan, Karceski and Lakonishok, 1999).

Impose optimal structure. This choice involves medium specification error and medium sampling error. The optimal tradeoff between specification error and sampling error has led to an optimal shrinkage towards the grand mean (Jorion, 1985, 1986), to an optimal shrinkage towards the singlefactor model (Ledoit 1999), or to the introduction of portfolio constraints (Jagannathan and Ma 2003).
We have N variables hi, i=1,...,N, ie, monthly returns for N different hedge fund indices, and T observations of these variables. PCA enables us to decompose htk as follows
where:

U is the matrix of the N eigenvectors of H’H

V is the matrix of the N eigenvectors of HH’.
where some structure is imposed by assuming that the residualsεtk are uncorrelated to one another. The percentage of variance explained by the first K factors is given by:
A sophisticated test by Connor and Corajczyk (1993) finds between four and seven factors for the NYSE and AMEX over 19671991, a finding roughly consistent with Roll and Ross’s (1980). Ledoit (1999) uses a fivefactor model. In this paper, we select the relevant number of factors by applying some explicit results from the theory of random matrices (see Marchenko and Pastur, 1967). The idea is to compare the properties of an empirical covariance matrix (or equivalently, a correlation matrix, since asset returns have been normalised to have zero mean and unit variance) to a null hypothesis purely random matrix as one could obtain from a finite timeseries of strictly independent assets. It has been shown (for a recent reference, see Johnstone 2001; for an application to finance, see Laloux et al. 1999) that the asymptotic density of eigenvalues of the correlation matrix of strictly independent assets reads:
Theoretically speaking, this result can be exploited to provide formal testing of the assumption that a given factor represents information and not noise. However, the result is an asymptotic result that cannot be taken at face value for a finite sample size. One of the most important features here is the fact that the lower bound of the spectrum λ is strictly positive (except for T=N), and, therefore, there are no eigenvalues between 0 and λ. We use a conservative interpretation of this result to design a systematic decision rule and decide to regard as statistical noise all factors associated with an eigenvalue lower than λ. In other words, we take K such that λ and λK+1 < λm, where λ is the greatest eigenvalue. A problem of a different nature comes from the nonstationarity of the data. Numerous empirical studies have highlighted, for example, the fact that the volatilities of asset classes are not constant over time and that nonstability would reduce the robustness of an optimisation where the risk parameters are set equal to their past values. The dynamic character of the parameters renders the task of estimation more arduous, a challenge that can be addressed by the use of suitably designed statistical models such as Garch models. Good modelling brings robustness back to portfolio optimisation over a long period by relying no longer on the stability of the risk parameters themselves, but on the stability of the models that define the variation in the risk parameters (variancecovariance). While it seems that there are many techniques for a better estimation of the variancecovariance matrix of asset returns, a major challenge remains: that of estimating the mean returns. It is for this reason that, as mentioned in the introduction, the recommended focus has often been on minimumrisk portfolios, whose derivation does not depend on any estimate of expected returns. • Noël Amenc is director of EDHEC Risk and Asset Management Research Centre, and Lionel Martellini is scientific director This article is based on research included in the EDHEC publication 'The Impact of IFRS and Solvency II on AssetLiability Management and Asset Management of Insurance Companies'