The Quaternions

Irish mathematician William Rowan Hamilton (also responsible for a very important advance in how to describe mechanical systems) invented or discovered (take your pick) a new algebra in 1843, that has proven highly useful in various ways. Hamilton famously struggled with trying to fit the pieces together right and the answer hit him while on a walk, so he carved it into the stone of the bridge he was crossing as he though of it – and there is a plaque on the bridge to commemorate this.

3Blue1Brown has a series of videos on the subject that give some nice visual intuitions, starting here. Kathy Loves Maths & Physics has a more history-focused introduction to the quaternions.

Derivation

So let's retrace the steps that Hamilton struggled with. Start by adding, to the complex numbers, a second imaginary square root of −1, notionally perpendicular to both the real line and the first one. As for complex, assume that multipliying by real numbers is commutative, i.e. x.q = q.x for any real x and whatever our resulting algebra now allows q to be; but let's not assume the same for multiplication by non-real values. (As Hamilton only knew of algebras in which that is true, the failure to consider the alternative is, I think, the principal thing that held him back. As you'll see, simply shedding that assumption leads us quickly to the result he struggled to find.) Let's use the names i, j for the two independent square roots of −1; the question of what we get when we multiply them then arises.

Absolute value and conjugation

On the complex numbers, we have an absolute value homomorphism from its non-zero multiplication to that of the positive reals within it (it also maps zero to zero); let's presume that we'll have one of these for our new algebra. Since it preserves positive reals, 1 = abs(1); since it's a homomorphism, with positive outputs for non-negative inputs, any q for which power(n, q) = 1 for any positive natural n has power(n, abs(q)) = 1 and thus abs(q) = 1, the unique positive real of which any positive power is 1. This, in particular, implies abs(−1) = 1 and every square root of −1 also has abs 1. Any value whose abs is 1 is the multiplicative inverse of its conjugate (since their product is the square of its abs). For any square root q of −1, we have 1 = −q.q so 1/q = −q; thus its conjugate is its inverse under both addition and multiplication. In particular, *(i) = −i and *(j) = −j.

Conjugation preserves real values, which commute with all others, so we presume conjugation to be real-linear – that is, *(p +x.q) = *(p) +x.*(q) for all real x. Our premise that i, j and 1 are mutually orthogonal can be expressed as saying that the square of abs(w +x.i +y.j) must be w.w +x.x +y.y for all real w, x and y. So now let's work out what that square is directly:

(w +x.i +y.j).(w −x.i −y.j)
= w.w −w.x.i −w.y.j +x.w.i −x.x.i.i −x.y.i.j +y.w.j −y.x.j.i −y.y.j.j
= w.w +x.x +y.y +(x.w −w.x).i +(y.w −w.y).j −x.y.i.j −y.x.j.i
= w.w +x.x +y.y −x.y.(i.j +j.i)

For this to be w.w +x.x +y.y, even when x and y are both non-zero, we must in fact have i.j +j.i = 0. That means i and j anti-commute, i.e. their product's sign is flipped by swapping the order of multiplication. (This was exactly the stumbling block that held Hamilton back, and the insight that unravelled the whole mystery for him.) Additionally, i.j.j.i = i.(−1).i = −i.i = 1; so i.j and j.i are each other's additive and multiplicative inverses. This makes them square roots of −1 in their own right; and, in particular, mutually conjugate.

Direction

It remains to determine how i.j = −j.i relates to our other units. To this end, it suffices to look at the absolute value of a linear combination of i.j with the three units given to be orthogonal. Consider q = w +x.i +y.j +z.i.j; we know *(i) = −i, *(j) = −j and *(i.j) = j.i = −i.j, so we have *(q) = w −x.i −y.j +z.j.i whence the square of abs(q) will be

q.*(q)
= (w +x.i +y.j +z.i.j).(w −x.i −y.j +z.j.i)
= w.w −w.x.i −w.y.j +w.z.j.i +x.w.i −x.x.i.i −x.y.i.j +x.z.i.j.i +y.w.j −y.x.j.i −y.y.j.j +y.z.j.j.i +z.w.i.j −z.x.i.j.i −z.y.i.j.j +z.z.i.j.j.i
= w.w +x.x +y.y +z.z +(x.w −w.x −y.z +z.y).i +(y.w −w.y).j +(x.z −z.x).i.j.i +w.z.j.i +z.w.i.j −y.x.j.i −x.y.i.j

but reals commute, making the coefficients of i, j and i.j.i all zero

= w.w +x.x +y.y +z.z +(w.z −x.y).(j.i +i.j)
= w.w +x.x +y.y +z.z

since j.i +i.j = 0. (Note that this is only exercised once, at the very end; even without it, the other terms all cancel.)

This gives q.*(q) = w.w +x.x +y.y +z.z, which tells us that i.j is itself orthogonal to i, j and the reals. In particular, being orthogonal, i, j, i.j and 1 are linearly independent. So every value in their real-span is uniquely expressible as a real-linear combination of i, j, i.j and 1. Its squared abs is the sum of the real coefficients in that expression while our conjugation preserves the real part and negates the real coefficients of i, j and i.j. In particular, abs(p) and *(p) are zero precisely when p is zero, for p in this real-span.

The geometry, implied by abs as length, grants i.j the same status of orthonormal unit as each of i, j and 1. Given that i.j.i = −j.i.i = j and j.i.j = −j.j.i = i, the multiplication considers any two of {i, j, i.j} to be equally adequate to generate the whole algebra. So let us grant i.j the same status as i and j by giving it a name, k = i.j. We then have *(k) = −k, i = j.k, j = k.i implying {−1} = {i.j.k, j.k.i, k.i.j, k.k, j.j, i.i} and {1} = {k.j.i, j.i.k, i.k.j}.

Longer products

What happens when we multiply values in the real-span of {1, i, j, i.j} ? In particular, is it possible to get a finite product outside that span ? Applying distributivity lets us decompose any finite product of members of the span into a sum of terms, each of which is a product: of real coefficients of {1, i, j, i.j} from the various factors, intermixed with the members of this set of generators that each scaled. Commutativity of real-multiplication lets us pull all real factors out to the front of the product, making the typical term in the sum a finite product of a real (possibly 1) and potentially some factors drawn from {i, j}.

In any finite product of a real times at least three factors from {i, j}, if any two adjacent factors are the same, we can combine them to their square, −1, which is real, and real-commutativity allows us to bring it out to combine into the real factor in our product. Otherwise, no two adjacent terms in the product being the same, the first three factors are either i.j.i = −j.i.i = j or j.i.j = −j.j.i = i so we can replace them with a single factor. In each case, we shorten our sequence of factors from {i, j} by an even number of factors, preserving the parity of the sequence's length.

So, inductively, any product of a real times an even number of factors drawn from {i, j} reduces either to a real or to a real times i.j; and any product of a real times an odd number of factors drawn from {i, j} reduces to a real times either i or j. Thus every product of factors in the real-span of {1, i, j, k} is a sum of terms, each of which is a real times one of these four generators, hence itself in that span. Members of that span are called quaternions or, historically, Hamilton's numbers..

We saw, earlier, that every non-zero quaternion has non-zero abs. Since abs is a homomorphism to {reals} and no product of non-zero reals is zero, no product of non-zero quaternions is zero. (We needed to know every product of quaternions is a quaternion for that last step.)

Inverses and conjugates of products

Given the relationship between our conjugation and real-valued abs, abs(p).abs(p) = p.*(p), every p with non-zero abs has an inverse, 1/p = *(p)/abs(p)/abs(p). As abs(p) is non-zero whenever p is, every non-zero quaternion thus has a multiplicative inverse.

We saw above that *(i.j) = j.i = *(j).*(i); so does conjugation of a product generally swap the order of factors and conjugate each – making it a coautomorphism of our multiplication – or does that only apply sometimes ?

First observe that, since q.*(q) is real, hence commutes with other quaternions, p.q.*(q).*(p) = p.*(p).q.*(q) is likewise real. Furthermore, since abs is a homomorphism to reals, which commute: p.q.*(p.q) = abs(p.q).abs(p.q) = abs(p).abs(p).abs(q).abs(q) = p.*(p).q.*(q) = p.q.*(q).*(p). When either p or q is zero, *(p.q) = *(q).*(p) follows fatuously from *(0) = 0; otherwise, as p, q are non-zero, implying their product is non-zero so has non-zero abs and thus is invertible, we can cancel the factor of p.q in the foregoing to obtain *(p.q) = *(q).*(p). So *(p.q) = *(q).*(p) whether or not 0 is in {p, q}.

Thus conjugation is a coautomorphism of quaternion multiplication. In particular, this implies that inversion is likewise a coautomorphism of non-zero quaternion multiplication: 1/(p.q) = *(q).*(p)/(p.*(p).q.*(q)) = (1/q).(1/p). Notice that, for real and complex non-zero multiplication, which are commutative, there is no difference between automorphism and coautomorphism; and, indeed, their inversions are both, as are their conjugations (albeit that of the reals of fatuous).

Summary

We thus have an algebra whose values are {w +x.i +y.j +z.k: real w, x, y, z}, closed under multiplication and addition, equipped with a homomorphism abs of its mutiplication – that maps its zero member to zero, each positive real to itself and all other values to positive reals – and a coautomorphism of its multiplication, called conjugation and denoted *(p) ←p, for which p.*(p) = abs(p).abs(p). For the metric defined by this square of abs, the generators 1, i, j and k are mutually orthonormal. The non-zero multiplication forms a group (so never produces a zero product) and the arithmetic forms a ring.

Hamilton set out to make a three-dimensional algebra but ended up with a four-dimensional one, with one real dimension and three imaginary ones. As we'll see below, those three imaginary ones can be used to model a three-dimensional real vector space, leading to a geometric interpretation of our multiplication.

OK, so far so neat and tidy, but what's it good for ? Let's take a look at something it's used for extensively in software that has to model three-dimensional rotations – notably including anything that displays a dynamically updated 3-D scene to the user. There are other ways to represent those, but this one is significantly more computationally robust.

Rotations

Notice that left-multiplying a quaternion by i performs a quarter turn rotation of its (j, k) compoents and of its (1, i) components; i.(u.j +v.k) = u.k −v.j and i.(u +i.v) = −v +i.u. As a result, for any angle a, Cos(a) +i.Sin(a) will likewise rotate through angle a, in each case. When we right-multiply by i, we likewise perform a quarter turn; this turns the (1, i) components the same way but turns the j, k components the opposite way, (u.j +v.k).i = v.j −u.k. As a result, if we multiply by Cos(a) +i.Sin(a) on the left and Cos(a) −i.Sin(a) on the right, the effect on the (1, i) components will cancel out but the effect on the (j, k) components will compound.

In working this out, I'll exploit two useful facts: two sums involving only i and reals will commute (just as they do for complex numbers, because i commutes with itself and reals) and multiplying such a sum, q, by j satisfies q.j = j.*(q). The latter works because j commutes with the real part of q and anticommutes with its i part.

(Cos(a) +i.Sin(a)).(w +x.i +y.j +z.k).(Cos(a) −i.Sin(a))
= (Cos(a) +i.Sin(a)).(w +x.i +(y +z.i).j).(Cos(a) −i.Sin(a))
= (Cos(a) +i.Sin(a)).(w +x.i).(Cos(a) −i.Sin(a)) +(Cos(a) +i.Sin(a)).(y +z.i).j.(Cos(a) −i.Sin(a))
= (w +x.i).(Cos(a) +i.Sin(a)).(Cos(a) −i.Sin(a)) +(y +z.i).(Cos(a) +i.Sin(a)).(Cos(a) +i.Sin(a)).j
= (w +x.i).(Cos(a).Cos(a) +Sin(a).Sin(a)) +(y +z.i).(Cos(a).Cos(a) +2.i.Cos(a).Sin(a) −Sin(a).Sin(a)).j

We can now use the double-angle formulae, Sin(2.a) = 2.Sin(a).Cos(a) and Cos(2.a) = Cos(a).Cos(a) −Sin(a).Sin(a) to simplify the later term of this, along with the usual pythagorean property of Sin and Cos for the first:

= w +x.i +(y +z.i).(Cos(2.a) +i.Sin(2.a)).j
= w +x.i +(Cos(2.a) +i.Sin(2.a)).(y.j +z.k)
= w +x.i +Cos(2.a).(y.j +z.k) +Sin(2.a).(y.k −z.j)
= w +x.i +(Cos(2.a).y −Sin(2.a).z).j +(Sin(2.a).y +Cos(2.a).z).k

constituting a rotation in the j, k plane through angle 2.a, leaving the real and i components unchanged. Naturally, we can do the same with j or k in place of i to rotate the other two imaginary components while leaving the real part and the component whose unit replaced i unchanged. In each case, our real Cos(a) plus imaginary times Sin(a) add up to a unit quaternion, one whose abs is 1.

If we take any quaternion q and divided it by abs(q) we get a unit quaternion; if we multiply by that on the left and by its conjugate on the right, we might reasonably hope to see the result be a rotation about its imaginary component, through twice the angle whose Cos and Sin are the real and imaginary parts of q/abs(q). Now, multiplying by *(q) and dividing by two factors of abs(q) is equivalent to dividing by abs(q).abs(q)/*(q) = q, so this suggests we can implement rotations by multiplying by a quaternion on the left and dividing by it on the right. To see whether this works, however, we need to consider a general quaternion, or at least the result of dividing it by its abs.

Alternate units

Rather than coming at that directly, however, let's consider a general imaginary unit, I = i.Cos(a) +Sin(a).(j.Cos(c) +Sin(c).k), with a between zero and turn/2 and c in the range from zero to turn; one of the other imaginary units perpendicular to this is J = i.Sin(a) −Cos(a).(j.Cos(c) +Sin(c).k). As before, I'll express things in the span of j and k as either j or k times things in the span of real and i, which commute with one another and conjugate-commute with factors of j and k. Now let's next consider

K = I.J
= (i.Cos(a) +Sin(a).(j.Cos(c) +Sin(c).k)).(i.Sin(a) −Cos(a).(j.Cos(c) +Sin(c).k))
= (i.Cos(a) +Sin(a).(Cos(c) +Sin(c).i).j).(i.Sin(a) −Cos(a).(Cos(c) +Sin(c).i).j)
= i.i.Cos(a).Sin(a) −Cos(a).Cos(a).(Cos(c) +Sin(c).i).i.j +Sin(a).Sin(a).(Cos(c) +Sin(c).i).j.i −Cos(a).Sin(a).j.(Cos(c) −Sin(c).i).(Cos(c) +Sin(c).i).j
= −Cos(a).Sin(a) −Cos(a).Cos(a).(Cos(c) +Sin(c).i).k −Sin(a).Sin(a).(Cos(c) +Sin(c).i).k −Cos(a).Sin(a).j.(Cos(c).Cos(c) −i.i.Sin(c).Sin(c)).j
= −Cos(a).Sin(a) −(Cos(c) +Sin(c).i).k −Cos(a).Sin(a).j.j
= Sin(c).j −Cos(c).k
I.I
= (i.Cos(a) +Sin(a).(j.Cos(c) +Sin(c).k)).(i.Cos(a) +Sin(a).(j.Cos(c) +Sin(c).k))
= (i.Cos(a) +Sin(a).(Cos(c) +Sin(c).i).j).(i.Cos(a) +Sin(a).(Cos(c) +Sin(c).i).j)
= i.i.Cos(a).Cos(a) +Cos(a).Sin(a).(Cos(c) +Sin(c).i).i.j +Cos(a).Sin(a).(Cos(c) +Sin(c).i).j.i +Sin(a).Sin(a).j.(Cos(c) −Sin(c).i).(Cos(c) +Sin(c).i).j
= −Cos(a).Cos(a) +Sin(a).Sin(a).j.(Cos(c).Cos(c) −i.i.Sin(c).Sin(c)).j
= −1
J.J
= (i.Sin(a) −Cos(a).(j.Cos(c) +Sin(c).k)).(i.Sin(a) −Cos(a).(j.Cos(c) +Sin(c).k))
= (i.Sin(a) −Cos(a).(Cos(c) +Sin(c).i).j).(i.Sin(a) −Cos(a).(Cos(c) +Sin(c).i).j)
= i.i.Sin(a).Sin(a) −Sin(a).Cos(a).(Cos(c) +Sin(c).i).i.j −Cos(a).Sin(a).(Cos(c) +Sin(c).i).j.i +Cos(a).Cos(a).j.(Cos(c) −Sin(c).i).(Cos(c) +Sin(c).i).j
= −1
K.K
= (Sin(c).j −Cos(c).k).(Sin(c).j −Cos(c).k)
= −Sin(c).Sin(c) −Cos(c).Cos(c) −Cos(c).Sin(c).(j.k +k.j)
= −1

From which we can infer I.J.K = K.K = −1 and all the analogous relations among I, J, K to match those among i, j and k. Consequently, multiplying by Cos(b) +I.Sin(b) on the left and by its conjugate on the right will indeed be a rotation in the J, K components through 2.b, as before. Even if we'd chosen a different imaginary unit perpendicular to I, in place of J, this would just have given us different orthonormal coordinates in the plane perpendicular to I, which makes no difference to the transformation being a rotation about the I axis.

Thus, indeed, multiplying by a quaternion q on the left and dividing by it on the right has the effect of leaving its real part unchanged while rotating the imaginary part of any quaternion about the imaginary part of q, through twice the angle whose ratio of Sin to Cos matches the ratio of the magnitude of q's imaginary part to its real part.

3-vector algebra

Given that the preceding is very much understanding imaginary parts of quaternions as vectors in a three-dimensional space, it makes sense to consider how our arithmetic relates to the geometry of such a space.

Consider two imaginary values, q = x.i +y.j +z.k and p = r.i +s.j +t.j with x, y, z, r, s and t real. Since q and p are imaginary, each is the negation of its conjugate, *(p) = −p, *(q) = −q. When we multiply them, we get

q.p
= (x.i +y.j +z.k).(r.i +s.j +t.k)
= x.r.i.i +x.s.i.j +x.t.i.k +y.r.j.i +y.s.j.j +y.t.j.k +z.r.k.i +z.s.k.j +z.t.k.k
= −(x.r +y.s +z.t) +(y.t −z.s).i +(z.r −x.t).j +(x.s −y.r).k
p.q
= −(x.r +y.s +z.t) +(z.s −y.t).i +(x.t −z.r).j +(y.r −x.s).k

These have the same real part while their imaginary parts sum to zero. This is because each of i, j, k commutes with itself, producing a real value, but anti-commutes with the others, producing an imaginary one. These two parts are the original forms of the inner and outer products of two 3-vectors. For now, I'll write them as dot(q, p) = −(q.p +p.q)/2 and cross(q, p) = (q.p −p.q)/2, so that q.p = cross(q, p) −dot(q, p). The output of dot is real, while that of cross is imaginary; and each is real-linear in each of its parameters.

A general quaternion is just an imaginary one plus a real, so let w and u be reals and observe (w +q).(u +p) = w.u −dot(q, p) +w.p +u.q +cross(q, p). So we can fully encode our quaternion arithmetic in terms of real arithmetic and these two functions of imaginary parts.

Let's now look at what dot(q, p) and cross(q, p) look like when we interpret q and p as displacements in three-dimensional co-ordinates, with the coefficients of i, j and k as the lengths of components of the displacement parallel to three mutually-perpendicular axes. As conjugation is negation, it corresponds to reversing the displacement.

Simple identities

First consider the case of a self-product. Since cross is anti-symmetric, cross(q, q) is inevitably zero. For dot, we get dot(q, q) = −q.q = x.x +y.y +z.z, the usual pythagorean combination of co-ordinates to obtain the squared length of the displacement.

The second of these implies that dot(p, q) = 0 precisely if p and q are mutually perpendicular: the length of u.p +v.q for real u, v is dot(u.p +v.q, u.p +v.q) = u.u.dot(p, p) +v.v.dot(q, q), thanks to the dot(p, q) and dot(q, p) terms dropping out. This is just the sum of the squared lengths of u.p and v.q; by the cosine rule, this implies the angle between them has cosine 0, i.e. is ±turn/4 modulo turn. Now, dot(p, q) = 0 is equivalent to q.p +p.q = 0, from the definition of dot, so orthogonal unit quaternions anticommute.

When p and q aren't given to be orthogonal, the symmetry of dot gives us: dot(p +q, p +q) = dot(p, p) +dot(q, q) +2.dot(p, q). If we interpret p +q as the displacement vector along one side of a triangle and p, q as the displacements along first one of the others, then the last, via an angle a opposite p +q, we see that 2.dot(p, q) is the discrepancy between the square of the p +q side's length and the sum of squares of the other two sides. The cosine rule tells us this is just −2.Cos(a) times the product of the lengths of the p and q sides, and −Cos(a) is the Cos of the external angle of our triangle (a's complement in a straight line), i.e. the angle we have to turn by to get from the direction of p to the direction of q. That external angle is equally the angle we would get between p and q if they started in the same place, instead of one of them starting where the other ends. So dot(p, q) is just the product of the lengths of p and q times the cosine of the angle between them.

Double-cross

Next, let's look at what happens when we combine two vectors with cross and then combine the result with one of the two we started with, using dot. As before, we can take q = x.i +y.j +z.k, p = r.i +s.j +t.k for some reals x, y, z, r, s, t:

dot(p, cross(p, q))
= dot(r.i +s.j +t.k, (z.s −y.t).i +(x.t −z.r).j +(y.r −x.s).k)
= r.(z.s −y.t) +s.(x.t −z.r) +t.(y.r −x.s)
= r.s.z −r.t.y +s.t.x −r.s.z +r.t.y −s.t.x
= 0

implying cross(p, q) is perpendicular to p; by (anti)symmetry this in turn implies it is perpendicular to q.

cross(p, cross(p, q))
= cross(r.i +s.j +t.k, (z.s −y.t).i +(x.t −z.r).j +(y.r −x.s).k)
= (s.(y.r −x.s) −t.(x.t −z.r)).i +(t.(z.s −y.t) −r.(y.r −x.s)).j +(r.(x.t −z.r) −s.(z.s −y.t)).k
= (r.(r.x +s.y +t.z) −(r.r +s.s +t.t).x).i +(s.(r.x +s.y +t.z) −(r.r +s.s +t.t).y).j +(t.(r.x +s.y +t.z) −(r.r +s.s +t.t).z).k
= dot(p, q).p −dot(p, p).q

In particular, when p and q are perpendicular, cross(p, cross(p, q)) = −dot(p, p).q, the result of applying two quarter turns to q and scaling by the squared length of p. Given that we already know cros(p, q) is perpendicular to q and p, hence the scaled result of a quarter turn of q about p, and cross(p) must act on it the same way, the scaling this involves must in fact be the length of p. Thus cross(p) rotates any vector perpendicular to p through a right angle and scales it by the length of p; and it maps any vector parallel to p to zero, as cross is antisymmetric.

Length of the cross product

For a general q (not necessarily perpendicular to p), let's look at the length of cross(p, cross(p, q)) = dot(p, q).p −dot(p, p).q, taking a to be the angle between p and q, so that dot(p, q) = abs(p).abs(q).Cos(a); the length's square is

dot(dot(p, q).p −dot(p, p).q, dot(p, q).p −dot(p, p).q)
= dot(p, q).dot(p, q).dot(p, p) −2.dot(p, q).dot(p, q).dot(p, p) +dot(q, q).dot(p, p).dot(p, p)
= dot(q, q).dot(p, p).dot(p, p) −dot(p, q).dot(p, q).dot(p, p)
= dot(q, q).dot(p, p).dot(p, p).(1 −Cos(a).Cos(a))
= dot(q, q).dot(p, p).dot(p, p).Sin(a).Sin(a)

Thus the length of cross(p, cross(p, q)) is abs(q).dot(p, p).Sin(a); and, as cross(p, q) is perpendicular to p, this is just the result of scaling it by abs(p) and quarter-turning it about p; so dividing its length by abs(p) gets the length of cross(p, q) as abs(q).abs(p).Sin(a). So cross(p, q) is perpendicular to both p and q and its length is the product of their lengths times the sine of the angle between them.

Now let's decompose a vector q into components parallel and perpendicular to p; the parallel component is h.p for some scalar h; since the two components add up to q, that leaves q −h.p as the remainder, which must be perpendicular to p; whence 0 = dot(p, q −h.p) = dot(p, q) −h.dot(p, p) and h = dot(p, q) / dot(p, p). The length of h.p = dot(p, q).p / dot(p, p) is just dot(p, q)/abs(p), which is abs(q) times the cosine of the angle between p and q; since the sum is q, of length abs(q), the perpendicular component q −dot(p, q).p / dot(p, p) has length abs(q) times the sine of that angle. Since cross(p) maps the h.p to zero, cross(p, q) is just the image under cross(p) of q −dot(p, q).p / dot(p, p) under a quarter turn and scaling by abs(p); and, indeed, has length abs(q).abs(p) times the sine of the angle between them. So we can picture cross(p) as projecting its input parallel to p onto the plane perpendicular to p, quarter-turning the result within that plane and scaling by the length of p.

A caveat

The quaternions provide us with a way to describe the geometry of three-dimensional space algebraically. However, this does not generalise to other than three dimensions, and the proper understanding of the relevant geometric concepts in other dimensions requires the overt analysis of a metric – which dot encodes, in the quaternions – and a related fully-antisymmetric form (a square root of the metric's determinant, give or take the sign of the latter) expressed here by cross. Furthermore, certain peculiarities of three dimensions lead to these having a significantly simpler (or, at least, simpler-seeming) form than arises in general dimension.

In particular, the apparent simplicity in the quaternions conceals the distinctions between a vector space, its dual and certain antisymmetrising tensor spaces derived from these. That concealment obfuscates important parts of what is really happening with dot and cross, and more generally with the vector algebra, obscuring important parts of the proper study of vector algebra and, in my opinion, actively making it harder to understand vector algebra, through the need to unpick the aspects conflated by the quaternions.

So, while the quaternions were historically one of the initial inspirations for the development of vector calculus, and provide a nice illustration of how arithmetic can interact with vector algebra, I do not use it as the start-point for the latter subject, much less as a source for notational inspiration. I regard it more as a curiosity of arithmetic, that may in some particular cases be of use.

In contrast, the misnamed complex numbers play a vital and vibrant rôle in both algebra and analysis, ensuring polynomials can always be expressed as products of linear factors and providing a context in which merely being differentiable in a small neighbourhood implies a function's behaviour everywhere. These are qualitatively more significant features of the complex algebra, that make it an essential cornerstone of mathematics.

3-D rotations

Next consider how we got at rotations; if we have an imaginary unit p, I reasoned above that (Cos(a) +p.Sin(a)).h.(Cos(a) −p.Sin(a)) has the same real part as h, while its imaginary part is rotated about p through 2.a. Let's now verify that in terms of dot and cross. We can write h = w +q for some real w = (*(h) +h)/2 and imaginary q = (h −*(h))/2. Expanding that:

(Cos(a) +p.Sin(a)).h.(Cos(a) −p.Sin(a))
= Cos(a).h.Cos(a) +p.Sin(a).h.Cos(a) −Cos(a).h.p.Sin(a) −p.Sin(a).h.p.Sin(a)
= h.Cos(a).Cos(a) +(p.q −q.p).Cos(a).Sin(a) −p.h.p.Sin(a).Sin(a)
= h.Cos(a).Cos(a) +2.cross(p, q).Cos(a).Sin(a) −p.h.p.Sin(a).Sin(a)

in which

p.h.p
= p.(w +q).p
= p.w.p +p.q.p
= w.p.p +p.(cross(q, p) −dot(q, p))
= −w +p.cross(q, p) −dot(q, p).p
= dot(p, cross(p, q)) −w +cross(p, cross(q, p)) −dot(q, p).p
= −w +dot(p, p).q −dot(q, p).p −dot(q, p).p
= −w +dot(p, p).q −2.dot(q, p).p

and p is given to be a unit, so dot(p, p) = 1; thus,

(Cos(a) +p.Sin(a)).h.(Cos(a) −p.Sin(a))
= h.Cos(a).Cos(a) +2.cross(p, q).Cos(a).Sin(a) +(w −q +2.dot(q, p).p).Sin(a).Sin(a)
= w +q.(Cos(a).Cos(a) −Sin(a).Sin(a)) +2.cross(p, q).Cos(a).Sin(a) +2.dot(q, p).p.Sin(a).Sin(a)

We can now use the double-angle formulae, Sin(2.a) = 2.Sin(a).Cos(a) and Cos(2.a) = Cos(a).Cos(a) −Sin(a).Sin(a) to express this as:

= w +q.Cos(2.a) +cross(p, q).Sin(2.a) +dot(q, p).(1 −Cos(2.a)).p
= w +dot(q, p).p +(q −dot(q, p).p).Cos(2.a) +cross(p, q).Sin(2.a)

Sure enough, this has the same real part, w, as h. As for its imaginary part, dot(q, p).p is just the component of q parallel to p and q −dot(q, p).p the component perpendicular to p, of which we retain a Cos(2.a) fraction while adding the result of quarter-turning it about p, cross(p, q), scaled by Sin(2.a); these, indeed, add up to the result of rotating the perpendicular part of q about p through 2.a in the plane perpendicular to p. So, indeed, we have verified that our rotation is as expected.

I've done that verification for the case of a unit quaternion, where dividing on the right is just multiplication by its conjugate. More generally, as noted above, multiplying by the quaternion on the left and dividing by it on the right has the same effect as, first, dividing the quaternion by its abs and then multiplying by the resulting unit on the left and its conjugate on the right; so the foregoing suffices also to establish that multiplying on the left and dividing on the right likewise implements a rotation, through twice an angle whose ratio of Cos to Sin is the same as the ratio of real part to abs of imaginary part.

Computational efficiency

Dividing by a quaternion generally will be done by multiplying by its conjugate and then dividing by its squared absolute value. Pause to notice that each multiplication of quaternions involves 16 real multiplications and 12 real additions of the resulting products; two multiplications will take twice that, but the particular case where the first and last factors in a product of three are mutually conjugate may well offer some simplification that can save some of that computation. So consider:

(w +x.i +y.j +z.k).i.(w −x.i −y.j −z.k)
= (w.w +x.x −y.y −z.z).i +2.(x.y +w.z).j +2.(x.z −w.y).k
(w +x.i +y.j +z.k).j.(w −x.i −y.j −z.k)
= (w.w +y.y −z.z −x.x).j +2.(y.z +w.x).k +2.(x.y −w.z).i
(w +x.i +y.j +z.k).k.(w −x.i −y.j −z.k)
= (w.w +z.z −x.x −y.y).k +2.(x.z +w.y).i +2.(y.z −w.x).j

whence

(w +x.i +y.j +z.k).(r +s.i +t.j +u.k).(w −x.i −y.j −z.k)
= r.(w.w +x.x +y.y +z.z) +s.((w.w +x.x −y.y −z.z).i +2.(x.y +w.z).j +2.(x.z −w.y).k) +t.((w.w +y.y −z.z −x.x).j +2.(y.z +w.x).k +2.(x.y −w.z).i) +u.((w.w +z.z −x.x −y.y).k +2.(x.z +w.y).i +2.(y.z −w.x).j)
= r.(w.w +x.x +y.y +z.z) +(s.(w.w +x.x −y.y −z.z) +2.t.(x.y −w.z) +2.u.(x.z +w.y)).i +(t.(w.w +y.y −x.x −z.z) +2.u.(y.z −w.x) +2.s.(x.y +w.z)).j +(u.(w.w +z.z −x.x −y.y) +2.s.(x.z −w.y) +2.t.(y.z +w.x)).k

Since the first and last factors have the same real coefficients, aside from sign, we end up with six products of them plus their four squares, that we combine by 18 additions to make the coefficients of the middle factor's real coefficients, requiring ten more multiplications (assuming doubling comes at negligible cost) and six more additions, for a total of 20 multiplications and 24 additions. This works out more efficient than the brute force 32 multiplications and 24 additions, especially if we're going to divide all terms by w.w +x.x +y.y +z.z, which saves us one of our multiplications (by that factor) and the need to do this division for the real part.

Given that we're working out that sum of squares, the sum/difference terms in those squares can be obtained by summing the two squares to be subtracted, doubling the result and subtracting that from the already computed sum of all four squares. This saves us three more additions. So this computation can be done more efficiently than simply multiplying three values and dividing by the real product of first and last.

That gain in efficiency also comes with a gain in accuracy, at least when any of the terms that cancelled out in deriving the formulae above are large.

Can we do that again ?

What happens if we now try to add yet another imaginary unit ? Let's call it u and see what happens, assuming we can still have abs and its associated conjugation. Considering 1, i, u, i.u in exactly the same way as we did for 1, i, j, i.j above, we inevitably conclude that u.i = *(i.u) = −i.u = 1/(i.u); and we can repeat the same process with each of j, k in place of i, ensuring u anticommutes with each of the other imaginary units.

However, this leads to a problem: consider +1 = k.k.u.u = −k.u.k.u, thanks to antisymmetry, swapping the central k.u; now substitute k = i.j to get +1 = −i.j.u.i.j.u = j.i.u.i.j.u = j.u.j.u = u.u = −1.

It turns out that the assumption of associativity has broken down: I've been tacitly assuming that we can drop the parentheses in ((i.j).u).((i.j).u) and regroup to exploit antisymmetry. While it is possible to go past the quaternions to an 8-dimensional algebra, called the Octonions (and beyond, with further doublings), it's not associative (hence not a ringlet), thereby straying a little further from what we normally think of arithmetic than I care to include in my discussion of that topic.


Valid CSSValid HTML 5 Written by Eddy.