]> Eigenvectors and eigenvalues

Eigenvectors and eigenvalues

Given a linear space V over some collection of scalars, a linear map (V: f |V) is known as an endomorphism of V. Given such, a non-zero vector v in V is described as an eigenvector of f, with scalar k as its eigenvalue, precisely if f(v) = k.v.

Characteristic polynomial

Inevitably, any scalar multiple of such a v is also an eigenvector of f, with eigenvalue k. In a one-dimensional space, any non-zero v has (v.x: x is scalar} equal to the whole space, hence every member of the space is some scalar multiple of v; for given u and non-zero v in the space, write u/v for the relevant scalar. For any endomorphism f of a one-dimensional space, any non-zero v is an eigenvector with eigenvalue f(v)/v and all members of the space are scalar multiples of v, hence f is simply scaling by f(v)/v.

In particular, for any finite-dimensional space V of dimension n we can construct the alternating space Alt(n, V) = {bulk(^, h): (V: h |n) is a list}, wherein ^ is the antisymmetrising tensor product; any basis (V:b|n) expresses the entries in each list (V: h |n) as linear sums of the outputs of b, hence bulk(^, h) as a sum of scalar multiples of bulk(^, b); so every member of Alt(n, V) is a scalar multiple of bulk(^, b) and Alt(n, V) is one-dimensional. Furthermore, for any (V:|n) basis b and list h, (: bulk(^, h −x.b)/bulk(^, b) ←x :{scalars}) is a polynomial of degree n (with leading-order coefficient ±1).

For a linear map (V: f |V) with V of dimension n, the determinant of f is defined to be the map f induces on Alt(n, V), namely det(f) = (Alt(n, V): q.bulk(^, f&on;b) ←q.bulk(^, b); scalar q |Alt(n, V)). When f has a non-zero eigenvector v with eigenvalue k, f −k.V (where V is synonymous with its own identity) maps non-zero v to zero, hence its kernel is more than {0} and it's singular (i.e. not invertible). Since, for any basis (V:b|n), v is some linear combination of the entries in b, giving f(v) as a linear combination of the entries in f&on;b, and 0 = f(v) −k.v as a linear combination of the entries in (f −k.V)&on;b, bulk(^) maps this last to zero; thus det(f −k.V) is the zero linear map on Alt(n, V) and k is one of the roots of the polynomial, of degree at most n,

χ(f) = ({scalars}: bulk(^, (f −x.V)&on;b)/bulk(^, b) ←x :{scalars})

This is known as the characteristic polynomial (the greek letter χ, called chi, is the consonant that starts characteristic) of the endomorphism f. It takes value zero at the eigenvalue of every non-zero eigenvector of f, as just shown; so consider an arbitrary k for which χ(f, k) is zero. With our given basis (V: b |n), this says bulk(^) maps (f −k.V)&on;b to zero, which says there is some linear dependence among the entries in this list; let that be given by ({scalars}: z |n) for which sum(: z(i).(f(b(i)) −k.b(i)) ←i |n) = 0, i.e. sum(z.f&on;b) = k.sum(z.b), with at least some z(i) non-zero. Then f(sum(z.b)) = sum(z.f&on;b) = k.sum(z.b) and sum(z.b) is an eigenvector of f, with eigenvalue k; since z has at least some non-zero entries, sum(z.b) is a non-zero eigenvector. Thus every root of χ(f) is an eigenvalue of f, just as every eigenvalue of f is a root of χ(f). Define, for given scalar k and linear map (V: f |V):

The algebraic multiplicity: of k as a root of χ(f) is the number of factors of (: x−k ←x :) by which one must divide χ(f) to get a polynomial that doesn't have a zero at k; and
The geometric multiplicity: of k as an eigenvalue of f is the dimension of kernel(f −k.V).

We've just seen that if either multiplicity is non-zero then so is the other; however, these two multiplicities are not, in general, equal.

Eigenbasis

When the characteristic polynomial χ(f) of a linear map (V: f |V), with V of finite dimension n, has n distinct roots, we get n eigenvectors; we might reasonably hope these would form a basis. Given that most polynomials of degree n do have n distinct roots, this would ensure that linear maps usually have bases of eigenvectors. When such a basis exists, it is known as an eigenbasis for the linear map.

Consider any linear space V of finite dimension n and endomorphism f of V whose characteristic polynomial has n distinct roots. For each root, we have at least one eigenvector, so let ({scalars}: k |n) and (V: e |n) be lists of eigenvalues and non-zero eigenvectors, such that f(e(i)) = k(i).e(i) for each i in n. So let U = span(e) be of dimension m. Any member of U is sum(z.e) for some ({scalars}: z :n) and f(sum(z.e)) is just sum(z.k.e), which is also in span(e) = U, so (U:f|U) is an endomorphism of U, so its characteristic polynomial has degree m, so has at most m roots; but we have each k(i) as an eigenvalue of (U:f|U) and these are n distinct roots of the characteristic polynomial, so m ≥ n. However m is the dimension of U = span(e) and e has only n entries, so span(e) has dimension at most n ≤ m with equality precisely if e is linearly independent; which must indeed arise, as we now have n ≤ m ≤ n. In any case, we now have U = span(e) of dimension n as a subspace of V of dimension n, so U = V and e spans V, with only n entries, so e is a basis of V.

So when the characteristic polynomial has enough roots – as many as the dimension of the space – we can chose a basis of our space, for which each basis member is an eigenvector of our endomorphism.

Endomorphism polynomials

Now, just as we can build up polynomials as functions from scalars to scalars, we can actually also apply them to endomorphisms of a linear space, treating any constant term in the polynomial as a scaling of the linear space and reading powers of the free variable in terms of repeated composition of the endomorphism. Thus, given linear (V: f |V), we use power(0, f) = V (the identity) and, for each natural n, power(1+n, f) = f&on;power(n, f) to give meaning to any polynomial sum(k.power) for mapping ({scalars}: k :n) for some natural n. Indeed, we can even extend this to ({endomorphisms of V}: k :n) for arbitrary natural n, using sum(: (: k(i)&on;power(i, f) ←f :) ←i :) in place of sum(: k(i).power(i) ←i :), but I only need scalar k for the present.

When ({scalars}: k :1+n) has non-zero k(n), the polynomial sum(k.power) is said to have degree n; the term in it of form k(i).power(i), for any i, is referred to as its term of order i and k(i) is its coefficient of order i; its coefficient of highest order is k(n). (The zero polynomial, (: 0 ←x :), is understood as having some negative value as its degree, since it has no (non-zero) term of any natural order.)

Now, if V has dimension n, {linear map (V:|V)} has dimension n.n. For any given (V: f |V), the list (: power(i, f) ←i |1+n.n) = [V, f, f&on;f, …, power(n.n, f)] has 1+n.n entries, whose span falls in a space of dimension n.n, so there must be some linear dependence among the power(i, f) with i ≤ n.n; and any such linear dependence can equally be read as a polynomial, with scalar coefficients, of which f is a root. Let P be the collection of polynomials, with scalar coefficients, of which f is a root. Multiplying any member of P by any polynomial yields a member of P; adding two members of P yields a member of P (so P is an ideal in the ring of polynomials). We know, from the linear dependence among the powers of f, that P has at least some non-zero member of positive degree at most n.n. Dividing any member of P by its coefficient of highest order gives a member of P, of the same degree, with a simple power as highest order term. Let m be a member of P with least positive degree whose highest order term is a simple power; so m is sum(k.power) for some ({scalars}: k :1+i) with i natural and k(i) = 1.

For any polynomial p in P we can now do polynomial long division to obtain polynomials q, r for which p = m.q +r with r of degree less than the degree of m. (To divide p of degree i+a by m of degree i, using coeff = (: ({scalars}: k(j) ← sum(k.power) :{polynomials}) ←j :{naturals}), define ({polynomials}: R |1+a) and ({scalars}: h |1+a) by: R(a) = p, h(j) = coeff(i+j, R(j))/coeff(i,m), R(j) = R(j+1) −h(j+1).m; this ensures R(j) has degree at most i+j and leads to q = sum(h.power) and r = R(0) −h(0).m of degree less than i with r +q.m = p.) Now p in P has p(f) = 0 and m(f) = 0, so r(f) = p(f) −q(f).m(f) = 0 and r is in P of lower degree than m, so r = 0 by the specification of m (which makes r of degree at most zero; but a zero-degree poly is constant and r(f) = 0 then makes r = 0, with negative degree). Consequently, every member of P is some polynomial multiple of m; in particular, m is the unique polynomial of least degree with leading-order coefficient 1. It is thus known as the minimal polynomial of f (i.e. of which f is a root).

Now suppose we have some eigenvector v of f with eigenvalue k; then power(i, f, v) shall be equal to power(i, k, v) for every natural i and, whenever f is a root of some polynomial p, 0 = p(f, v) = p(k).v, so p(k) = 0 and k is also a root of p; so every eigenvalue of f is a root of every polynomial in P, in particular of m. Conversely, suppose k is some scalar root of m; then m = q.(: x −k ←x :) for some scalar polynomial q of degree one less than m (obtained by long division, with the remainder necessarily equal to m(k) = 0). As q's degree is less than that of m, q(f) is non-zero, so there is some u in V for which v = q(f, u) is non-zero; but now apply f −k.V to v and get ((f −k.V)&on;q(f))(u), but ((f −k.V)&on;q(f)) = m(f) = 0, so this is zero and non-zero v is in kernel(f −k.V), so is an eigenvector of f with eigenvalue k. So the eigenvalues of f are exactly the roots of the minimal polynomial of f. The minimal and characteristic polynomials of an endomorphism thus have the same roots, albeit possibly with different multiplicities.

The trajectory of a vector

Next, for any vector v in V of dimension n and linear (V: f |V), consider (: power(i, f, v) ←i :1+n) = [v, f(v), f(f(v)), …, power(n, f, v)], a list of 1+n vectors in an n-dimensional space; there must exist a linear dependence among these. Let m ≤ n be the least for which (: power(i, f, v) ←i :1+m) is linearly dependent; so b = (: power(i, f, v) ←i :m) is linearly independent and power(m, f, v) is some linear combination of it, sum(z.b) with ({scalars}: z |m). Applying f to any linear combination of b, e.g. power(m, f, v), we turn each b(i) into b(i+1) for i in m, and b(m) into sum(z.b), thereby obtaining a linear combination of b; so span(b) is preserved by f. This holds for any linear (V: f |V) and v in V. Define

Orbit = (: (: span(: power(i, f, v) ←i :{naturals}) ←v |V) ←(V: f |V) |{endomorphisms})

Now let U = Orbit(f, v) of dimension 1+m and u = (U: f |U), with b = (: power(i, f, v) ←i |1+m) as basis of U and power(1+m, f, v) = sum(z.b) for some ({scalars}: z :1+m). With (dual(U): p |1+m) as the basis dual to b, we have (U: f |U) = sum(: b(1+i)×p(i) ←i |m) +sum(z.b)×b(m). Subtracting k.U and taking determinant, det(f) = Alt(1+m, f) maps bulk(^, b) = b(0)^b(1)^…^b(m) to

bulk(^, (f −k.U)&on;b): = (f(b(0)) −k.b(0))^…^(f(b(m)) −k.b(m)); = bulk(^, (U: b(1+i) −k.b(i) ←i |m))^(sum(z.b) −k.b(m)); in which any term with a repeated entry in b gives zero, by antisymmetry. If the i-term uses b(1+i), then, with 1+i = j, the j-term must likewise use b(1+j), as using its −k.b(j) term would give zero; hence all terms after these must likewise use the f&on;b rather than their −k.b part of their turn; conversely, if the i-term uses its −k.b(i) part, then, for eah j in i, the j-term must also be using its −k.b(j) part. If the i-term uses its −k.b(i) part and, with j = 1+i, the j-term uses its b(1+j) part, then the over-all product uses −k.b(h) for each h in j and b(1+h) for each h in m with h ≥ j, giving us terms in b(0) up to b(i) and b(1+j) up to b(m), so that the only part sum(z.b) −k.b(m) that isn't using a repeated b is z(j).b(j) – except in the case j = m, where the bulk(^) has used −k.b(h) for each h in m and we get the (z(m) −k).z(m) part. We're thus left with:; = sum(: z(j).bulk(^, −k.(:b:j))^bulk(^, (: b(1+h+j) ←h+j :m))^b(j) ←j |m) +(z(m) −k).bulk(^, −k.(:b:m))^b(m); in which the j term has j factors of −k and needs cyclicly permuted to move the b(j) factor past m−j others of form b(1+h+j) with h+j in m; this permutation introduces a factor of power(m−j, −1), so we get:; = power(m, −1).(sum(: z(j).power(j, k) ←j :m) +(z(m) −k).power(m, k)).bulk(^, b); = power(m, −1).(sum(: z(j).power(j, k) ←j :1+m) −power(1+m, k)).bulk(^, b)

Thus, give or take an uninteresting factor of ±1, the characteristic polynomial is ±χ(U:f|U) = power(1+m) −sum(z.power). The coefficients of the lower-order terms are simply the coefficients in the linear dependence.

Now consider any polynomial p for which (U:p(f)|U) is zero or p(f, v) is zero, with U = Orbit(f, v). If (U:p(f)|U) is zero, then v in U implies p(f, v) is zero. If p(f) maps v to zero, then it must also map each power(n, f, v) also to zero, since each power(n, f) commutes with p(f) and power(n, f, p(f, v)) is zero; it thus maps all of U to zero. So the the same polynomials map (:f:U) to the zero mapping as map (:f:{v}) to zero and the minimal polynomial of (:f:U) is equally the minimal polynomial of f acting simply on v. This minimal polynomial is p = power(1+m) −sum(z.power) for some ({scalars}: z |1+m) and natural m; and it satisfies p(f, v) = 0, hence power(1+m, f, v) = sum(: z(i).power(i, f, v) ←i |1+m) and there is no smaller m for which there exists any such z; which is exactly the statement that m is the least for which power(1+m, f, v) is in the span of {power(i, f, v): i in 1+m}, the condition we used above when defining U; so the minimal polynomial of (:f:U) is, aside from possibly its sign, exactly the characteristic polynomial given above. Thus, for an endomorphism's restriction to the subspace generated by the endomorphism's action on one vector, the chaeacteristic polynomial is the minimal polynomial (give or take sign). In particular, this means the characteristic polynomial of (:f:U) maps (:f:U) to zero, for any U = Orbit(f, v) given any v in (:f|). This gives us a class of particular cases in which χ(g, g) is zero, namely any g which is the restriction of some linear map to its orbit of some specific vector.

The Cayley-Hamilton theorem

The thing to show next is that, in fact, χ(f, f) is zero for every endomorphism f; the characteristic polynomial of an endomorphism necessarily maps that endomorphism to the zero endomorphism on its space. This is the Cayley-Hamilton theorem. It is equivalent to the characteristic polynomial necessarily being a multiple of the minimal polynomial.

First consider the case of an endomorphism (V:f|V) for which there is some proper non-trivial subspace, W, of V that f preserves, i.e. for which (W:f|W). Pick a basis of W and extend it to a basis e of V, such that (V:e|n) for some natural n, with (W:e|m) for some positive m in n; as W is non-trivial, m > 0 and, as W is proper, m < n.

Written by Eddy.