Aman Ullah

Print publication date: 2004

Print ISBN-13: 9780198774471

Published to British Academy Scholarship Online: August 2004

DOI: 10.1093/0198774478.001.0001

Show Summary Details

(p.179) Appendix A Statistical Methods

Source:
Finite Sample Econometrics
Publisher:
Oxford University Press

The finite sample theories of econometrics heavily depend on several statistical concepts. These include moments, distributions, and asymptotic expansions. Accordingly, the objective here is to present results that are useful for the finite sample results covered in this book. In doing so it is assumed that the reader has a basic knowledge of probability and statistics.

A.1 Moments and Cumulants

The characteristic function of a random variable y is

$Display mathematics$
where μr′ = Ey r and the last equality is the expansion of ψ(t) around zero. Then the rth moment around zero of y is
$Display mathematics$
where ψ(r)(0) is the rth derivative of ψ(t) with respect to t and evaluated at t = 0.

The cumulant function is defined by

$Display mathematics$
(p.180) Using the expansion of K(t) around 0
$Display mathematics$
where
$Display mathematics$
is the rth cumulant of y.

It is easy to verify that κ1 = μ1′, κ2 = μ2, κ3 = μ3, $κ 4 = μ 4 - 3 μ 2 2$, κ5 = μ5 − 10μ3μ2, $κ 6 = μ 6 - 15 μ 4 μ 2 - 10 μ 3 2 + 30 μ 2 3$, where μr = E(yEy)r is the rth central moment around mean.

A.2 Gram–Charlier and Edgeworth Series

Gram (1879), and Charlier (1905) series represent the density of a standardized variable y as a linear combination of the standardized normal density φ (y) and its derivatives. That is

$Display mathematics$
where the c j are constants and
$Display mathematics$
H j(y) is a polynomial in y of degree j, which is the coefficient of t j/j! in exp (ty − ½t 2), for example H 0(y) = 1, H 1(y) = y, H 2(y) = y 2 − 1, H 3(y) = y 3 − 3y, and H 4(y) = y 4 − 6y 2 + 3. These H j(y) form an orthogonal set of polynomials (Hermite Polynomials) with respect to normal density φ (y), that is
$Display mathematics$
Because of this, multiplying f(y) by H j(y) and integrating term by term we get
$Display mathematics$
These c j can be obtained in terms of the moments of y, and these are
$Display mathematics$
(p.181) Then the Gram–Charlier series of Type A can be written as
$Display mathematics$
Using this
$Display mathematics$
If y is not a standardized variable then
$Display mathematics$
where (j)r = j(j − 1) ċċċ (jr +1). Then
$Display mathematics$
In this case
$Display mathematics$

The Edgeworth Type A (1905) series is closely related to Gram–Charlier series. For this we obtain the characteristic function around the normal distribution and then use inversion theorem to obtain the series expansion of the density. Let us write

$Display mathematics$
(p.182) where c j is given by (A.10). Alternatively, (A.14) can be obtained by using (A.4) as
$Display mathematics$

Now using the inversion theorem

$Display mathematics$
and the fact that if f has the characteristic function ψ(t) the f (j) has the characteristic function (−it)jψ (t), which gives
$Display mathematics$
we obtain the Gram–Charlier type series expansion in (A.11). If we collect the terms containing elements not higher than H 6 we can write
$Display mathematics$
This is often called the Edgeworth form of the Type A series, see Kendall and Stuart (1977). Further, if cumulants above the fourth are neglected the Edgeworth series reduces to
$Display mathematics$

We note that the above series can also be written as

$Display mathematics$
where D = d/dy. This is the form originally suggested by Edgeworth (1905), also see Kendall and Stuart (1977). The idea behind these series goes back to Chebyshev (1890), also see Cramér (1925, 1928) for the historical details.

A.3 Asymptotic Expansion and Asymptotic Approximation

For large values of nonstochastic x, consider

$Display mathematics$
(p.183) is the asymptotic expansion of a function f(x) if the coefficients are determined as follows:
$Display mathematics$

The series on the right, viz.,

$Display mathematics$
may be convergent for large values of x or divergent for all values of x.

However, it should be noted that the difference between f(x) and the sum of the n terms of its asymptotic expansion:

$Display mathematics$
is of the same order as the (n + 1)th term when |x| is large. Then the asymptotic expansion may be considered more suitable for approximate numerical computation than a convergent series.

Let us illustrate this point with the help of the following example, from Whittaker and Watson (1965: 150–151).

Consider the function

$Display mathematics$
where x is real and positive. By repeated integration by parts, we get
$Display mathematics$
We observe that the absolute value of the ratio of the (m + 1)th term to the mth term is equal to
$Display mathematics$
(p.184) which tends to ∞ as m ⇛ ∞ for all values of x. It follows that the series expansion of f(x) is, in fact, divergent for all values of x. In spite of this, however, the series can be used for the calculation of f(x). This may be seen as follows: Write
$Display mathematics$
and
$Display mathematics$
such that
$Display mathematics$
Then, because e |xt| < 1
$Display mathematics$
This is very small (for any value of n) for sufficiently large values of x. It follows, therefore, that the value of the function f(x) can be calculated with great accuracy for large values of x. Even for small values of x and n
$Display mathematics$
and
$Display mathematics$

It has also been shown by Whittaker and Watson (1965: 153, section 8.31), that it is permissible to integrate an asymptotic expansion term by term, the resulting series being the asymptotic expansion of the integral of the function represented by original series. It has also been stated that a given series can be an asymptotic expansion of several distinct functions; however, a given function cannot be represented by more than one asymptotic expansion, see Copson (1967), Kendall and Stuart (1977), and Srinivasan (1970).

A.3.1 Asymptotic Expansion (Stochastic)

Now we consider the case of stochastic asymptotic expansion. Most econometric estimators and test statistics can be expanded in a power series in n −1/2 with coefficients that are well behaved random variables. Suppose, for example, (p.185) Z n is an estimator or test statistic whose stochastic expansion is

$Display mathematics$
where ξj = O p(n −j) and T n, A n, and B n are sequences of random variables with limiting distribution as n tends to infinity. If R n is stochastically bounded, that is, P[|R n| > c] < ε as n ⇛ ∞ for every ε > 0 and a constant c, then the limiting distribution of $n ( Z n - ξ 0 )$ is the same as the limiting distribution of A n. Then expansion in (A.19) is the asymptotic expansion of Z n, see Chapter 3 for examples.

A.4 Moments of the Quadratic Forms Under Normality

Let N i, for i = 1, 2, 3, 4, be the symmetric matrices. Further consider a n × 1 vector y, which is distributed as a normal distribution with the mean vector Ey = μ and variance matrix as V(y) = Σ. Then the following results can be verified from the result of Lemma 1 and Exercises 2 and 4 in Chapter 2.

To write these results we first introduce the notations as given below:

$Display mathematics$
where tr represents the trace of a matrix. Then
$Display mathematics$
$Display mathematics$
(p.186)
$Display mathematics$
The above results can also be developed from Mathai and Provost (1992) where the moment generating functions approach is used.

In the special case where μ = 0, yN(0, Σ), the above results reduce to the following results.

$Display mathematics$
$Display mathematics$
$Display mathematics$
When yN(0, I) then these results in (A.22) remain the same except that Σ = I in all the terms. For the alternative derivations of these results see Magnus (1978, 1979), Srivastava and Tiwari (1976), and Mathai and Provost (1992).

In another special case where yN(0, I) and N 1 is an idempotent matrix of rank m then yN 1 y is distributed as a central χ2 at m d.f. In this case $tr ( N 1 i ) = m$, and

$Display mathematics$
which generalizes to
$Display mathematics$
This is the rth moment of a central χ2 distribution.

When the matrix N 1 is not a symmetric matrix then we can write yN 1 y = y′((N 1 + N 1′)/2)y where ((N 1 + N 1′)/2) is always a symmetric matrix. Thus the above results also go through when the matrices N 1 to N 4 are not symmetric.

(p.187) A.5 Moments of Quadratic Forms Under Nonnormality

Let y = (y 1, …,y n)′ to be an n × 1 vector of i.i.d. elements with

$Display mathematics$
for i = 1, …, n, where γ1 and γ2 are the Pearson's measures of skewness and kurtosis of the distribution and these and γ3 and γ4 can be regarded as measures for deviation from normality. For normal distributions, the parameters γ1, γ2, γ3, and γ4 are zero while for symmetrical distributions, only γ1 and γ3 are zero. These γ′s can also be expressed as cumulants, for example, γ1 and γ2 represent the third and fourth cumulants, see Section A.1.

Under the above assumptions the following results follow, where N 1 and N 2 matrices are not assumed to be symmetric, ι is an n × 1 vector of unit elements and * represents the Hadamard product:

$Display mathematics$
(p.188) Setting γ1, γ2, γ3, and γ4 equal to zero, we obtain the results for normally distributed disturbances given above. For derivations and applications, see Chandra (1983) and Ullah, Srivastava, and Chandra (1983).

We also note that E(yN 1 y.yy′) above gives the result for E((yN 1 y) (yN 2 y)) = tr[N 2 E(yN 1 y.yy′)]. Similarly E[(yN 1 y) (yN 2 y)yy′] gives the result for E((yN 1 y)(yN 2 y)(yN 3 y)) = tr[N 3 E(yN 1 y.yN 2 y.yy′)]. Further the results for the case where the mean of y is a vector Ey = μ the above results can be extended by writing, say, yN 1 y = (y − μ + μ)′N 1(y − μ + μ) = (y − μ)′ × N 1(y − μ) + μ′N 1μ + 2(y − μ)′N 1μ. Then E(yN 1 y) = E((y − μ)′N 1(y − μ)) + μ′N 1μ = σ2 tr(N 1) + μ′N 1μ.

Now consider the case where the elements y i are non i.i.d. such that

$Display mathematics$
for i, j, k, l = 1, …, n. Define a n × n matrix Θk = ((σijk)) and another n × n matrix Δkl = ((σijkl)) for i, j = 1, …, n. Further denote
$Display mathematics$
Then
$Display mathematics$

A.6 Moment of Quadratic Form of a Vector of Squared Nonnormal Random Variables

Consider an n × 1 random vector

$Display mathematics$
where y is an n × 1 random vector with Ey = 0, V(y) = σ2 I n, and M is an n × n idempotent matrix of rank r.

Let us write e = (e 1, …, e n)′, where e i = m i y and m i is a 1 × n ith vector element of the matrix

$Display mathematics$
(p.189) i, j = 1, …, n. Further denote $e ˙ = ( e 1 2 , ⋖ , e n 2 ) ′$ as an n × 1 vector of squares of e i and $M ˙ = ( ( m i j 2 ) )$ as an n × n matrix of the squares of the elements of M. Then, for a matrix N = ((n ij))
$Display mathematics$
and when M˙ ≃ I n.
$Display mathematics$
For the proof of this write
$Display mathematics$
Then using the results in (A.25) we get the result in (A.29).

When yN(0,σ2 I) then

$Display mathematics$

An application of this result can occur in the linear regression model y = Xβ + u where X is an n × k matrix. In this case e = My = Mu where M = IX(XX)−1 X′ is an idempotent matrix of rank nk. Several tests of heteroskedasticity are expressible as a quadratic form of ė, see Chapter 5.

A.7 Moments of Quadratic Forms in Random Matrices

Let us consider a stochastic n × M matrix Y = ((y it)) = (y 1, …, y M), where t = 1, …, n, i = 1, …, M and y i is a n × 1 vector. We assume that, for all i and t,

$Display mathematics$
where j, k, e = 1, …, M.

(p.190) Define

$Display mathematics$
so that σijk and γijke are 0 for normally distributed disturbances.

If N 1 is any nonstochastic matrix, we have

$Display mathematics$
Further denoting (1/n)E(YY) = Σ = ((σij)) and considering N 1 of appropriate dimensions we have
$Display mathematics$
Now introducing the following notations:
$Display mathematics$
and considering N 1 and N 2 are of appropriate dimensions we have
$Display mathematics$
(p.191) For the following results we introduce additional matrix representations:
$Display mathematics$
where n 1i, n 2i, and n 3i are now vectors for i = 1, …, M. We also use N 1 = ((n 1ij)), N 2 = ((n 2ij)) and N 3 = ((n 3ij)). Then
$Display mathematics$
(p.192) where M = ((m ij)) and m ij for M 1 to M 8 are, respectively,
$Display mathematics$

The above results simplify for the normal distribution by using σijk = 0 and γijke = 0. For the applications, see Ullah and Srivastava (1994), Ullah (2002), Srivastava and Maekawa (1995), among others. These results are useful in developing the moments of various econometric statistics under nonnormal errors, also see Lieberman (1997).

Let yN(μ, Σ) be an n × 1 normal vector with Ey = μ and V(y) = Σ. Further consider N to be an n × n nonstochastic matrix and b and c to be constants. Then

$Display mathematics$
if and only if r = Rank(NΣ), ΣNΣNΣ = ΣNΣ, Σ(b + Nμ) = ΣNΣ(b + Nμ) and θ = c + 2b′μ + μ′Nμ. The $χ r 2 ( θ )$ represents the noncentral chi-square distribution with the d.f. r and the noncentrality parameter θ. For θ = 0, (p.193) $χ r 2 ( θ ) = χ r 2$ becomes a central chi-square with the d.f. r, see Srivastava and Khatri (1979: 64), Rao (1973), and Mathai and Provost (1992).

A necessary and sufficient condition for

$Display mathematics$
is that ΣNΣNΣ = ΣNΣ with d.f. = r = Rank of NΣ, and θ = μ′Nμ. For μ = θ = 0, $y * ∼ χ r 2$ , which is a central chi-square.

If |Σ| ≠ 0 then the above necessary and sufficient condition becomes NΣN = N with the d.f. r = Rank of NΣ. This implies the condition that NΣ or Σ1/2 NΣ1/2 is an idempotent matrix of rank r, where Σ = Σ1/2Σ1/2, see Rao (1973: 188) and Srivastava and Khatri (1979: 64).

A necessary and sufficient condition that the vector N 1 y and the quadratic form (y − μ)′N(y − μ) are statistically independent is ΣNΣN 1 = 0, or NΣN 1 = 0 if |Σ| ≠ 0, where N 1 is an n × n nonstochastic matrix. Similarly (y − μ)′N 1(y − μ) and (y − μ)′N(y − μ) are independent if ΣN 1ΣNΣ = 0, or N 1ΣN = 0 if |Σ| ≠ 0.

A.8.1 Density and Moments of a Noncentral Chi-square Variable

Let yN(μ, Σ). Then

$Display mathematics$
where N is assumed to be an idempotent matrix of rank r, θ = (1/2)μ′Σ−1/2 NΣ−1/2μ and Σ−1/2 yN−1/2μ, I). The density function of the noncentral χ2 variable y* is
$Display mathematics$
Further the sth inverse moment of y* is
$Display mathematics$
which gives, for s = 1,2, …,
$Display mathematics$
see Ullah (1974).

(p.194) Using the derivatives of the confluent hypergeometric function in Slater (1960: 15, eq. 2.1.8) we have

$Display mathematics$
for m = 1,2,….

When μ = 0 so that θ = 0, the density of y* given above reduces to the density of a central $χ r 2$ . Further

$Display mathematics$

We note that the distribution of yNy = y′Σ−1/21/2 NΣ1/2−1/2 y is not a noncentral χ2 distribution unless Σ1/2 NΣ1/2 is idempotent. If Σ1/2 NΣ1/2 is not idempotent the sth inverse moment of yNy is as given in Chapter 2.

A.8.2 Moment Generating Function and Characteristic Function

Let yN(μ, Σ), where Σ = PP′. Let Let N 1, N 2, …, N m be m symmetric n × n matrices. Then the joint moment generating function of yN 1 y, …, yN m y is

$Display mathematics$
where C = P′(t 1 N 1 + ··· + t m N m)P and μ0 = P−1μ, see Magnus (1986), and Mathai and Provost (1992). For m = 2 this result is given in Chapter 2 and it has been used in Sawa (1972), Magnus (1986), and Mathai and Provost (1992) to obtain the moments of the product and ratio of quadratic forms. For example, if y 1* = yN 1 y and y 2* = yN 2 y, then
$Display mathematics$
For the moments of the ratio of y 1*/y 2*, see Chapter 2.

Now we consider the vector y distributed nonnormally with Ey = μ and V(y) = Σ, which is a diagonal matrix of $σ i 2$ , i = 1, …, n. If we let $φ ( y i | μ i , σ i 2 )$ be a normal density with mean μi and variance $σ i 2$ then the Edgeworth or Gram–Charlier series expansion of the density for f(y i) in Section A.2 can be (p.195) written as (Davis 1976)

$Display mathematics$
where $E z i ( z i r ) = ( - 1 ) r r ! c r$ and c r is as in (A.10).

Since y|zN(μ + z, Σ), therefore using the Davis (1976) technique the characteristic (c.f.) or moment generating function (m.g.f.) under the {Edgeworth} density can be obtained in two steps. First find the c.f. for the normal case. Second consider the expectation of this c.f. with respect to z. With this approach Knight (1985) provided the c.f. of a linear form ay and the quadratic form yNy with corrections for skewness and kurtosis, that is the first four terms of the Edgeworth expansion.

These results on the c.f. and m.g.f. provide the moments of the products and ratio of quadratic forms under the Edgeworth density of y. For applications, see Knight (1985, 1986) for the moments and distribution of the 2SLS estimator and Peters (1989) for the moments of the LS estimator in a dynamic regression model.

A.8.3 Density Function Based on Characteristic Function

When the absolute value of the c.f. ψ(t) = ψ(t 1, …, t n) is integrable then the density function f(y) exists and is continuous for all y, and it is given by

$Display mathematics$

This is known as the uniqueness theorem or inversion theorem for the c.f., see Cramér (1946).

Next consider the variable q, which is the ratio of two random variables Y and X, q = Y/X. Let ψ(t 1, t 2) be the c.f. of (Y,X). Then the density of q is given by

$Display mathematics$
see Cramér (1946). Phillips (1985) generalizes this result to matrix quotients Q = X −1 Y where X is a k × k positive definite matrix whose expectation of the determinant exists and Y is a k × l matrix. As an application of this result Phillips (1985) shows that the LS regression coefficient matrix for multivariate normal sample is a matrix t-distribution.

(p.196) A.9 Hypergeometric Functions

Here we present well known power series, which are used in the text.

A power series

$Display mathematics$
is known as the confluent hypergeometric function or Kummer's series, see Slater (1960: 2). It has been used extensively in finite sample econometrics, see Ullah (1974), Sawa (1972), and Phillips (1983), among others. Also, see Abadir (1999) for an introduction to hypergeometric function for economists.

Another power series, hypergeometric functions, is written as

$Display mathematics$
for |c| > 0 and |x| < 1, see Slater (1960), and Ullah and Nagar (1974).

Now we consider the integral of a function, which has a power series expansion in terms of hypergeometric functions, see Sawa (1972). This is, for 0 ≤ k < 1 and p > 1

$Display mathematics$
where
$Display mathematics$

For k = 1 and pq > 1

$Display mathematics$

To derive the above result we can first use the following change of variable transformation t = 1/(1 − 2x). Then

$Display mathematics$

Then using the binomial expansion of [1 − k(1 − t)]q and doing term by term integration gives the above result, Sawa (1972).

(p.197) A.9.1 Asymptotic Expansion

For a, c > 0, and x > 0 we have

$Display mathematics$
see Copson (1948: 265), Erdelyi (1956), Slater (1960), and Sawa (1972). For large x this gives the asymptotic expansion up to O(x −(r − 1)). Using this in the G function we get the asymptotic expansion, up to O−4), as
$Display mathematics$

A.10 Order of Magnitudes (Small o and Large O)

Here we decide the measure of the order of magnitude of a particular sequence, say, {X n}. The magnitude is defined by looking into the behavior of X n for large n.

Definition 1 The sequence {X n} of real numbers is said to be at most of order n k and is denoted by

$Display mathematics$
as n ⇛ ∞, for some constant c > 0. Further if {X n} is a sequence of random variables then it is said to be at most of order n k in probability,
$Display mathematics$
if, as n ⇛ ∞,
$Display mathematics$
where c n is a nonstochastic sequence.

(p.198) Definition 2 The sequence {X n} of real numbers is said to be of smaller order than n k and is denoted by

$Display mathematics$
as n ⇛ ∞. Further if {X n} is stochastic then
$Display mathematics$
if
$Display mathematics$

In the above definitions k can take any real value (positive or negative). Also the order of magnitude is almost sure if the convergence of the sequence is almost sure.

As an example, consider a stochastic sequence

$Display mathematics$
where EX i = μ and V(X i) = σ2. Then using Chebychev's inequality, the sequence X n is bounded in probability in the sense that P[|X n| > ε] ≤ 1/ε2 as n ⇛ ∞. Thus X n = O p(1) and $X ¯ - μ = O p ( n - 1 / 2 )$ .

The order of magnitudes satisfy the following properties.

If X n = O(n k) and Y n = O(n l) then

1. 1. X n Y n = O(n k+1).

2. 2. $X n r = O ( n r k )$

3. 3. Xn + Yn = O(n l 0, l 0 = Max(k,l).

The same results for small o in place of capital O. Further, if X n = O(n k) and Y n = o(n l), then

1. 1. Xn + Yn = O(n k).

2. 2. X n Y n = O(n k+l).