]> Differentiation: quantifying variation

Differentiation: quantifying variation

The differential calculus quantifies rates of variation of functions by providing, for a given function evaluated at a specified input, a linear map from a small changes in the input to an estimate of the resulting change in the output. It only does this where such an estimate can be highly accurate near the specified input; specifically, when the scale of its errors shrinks significantly faster than the scale of the small change in input, as the input draws closer to the specified input.

Orthodoxy generally introduces differentiation in the context of functions from the real number line to itself; in this context, a linear map from (small changes in) input to (small changes in) output can faithfully be represented by a real number, namely the change in output divided by the change in input. This leads to deriving derivatives of functions between vector spaces via partial derivatives, that quantify how one co-ordinate of the output (given some co-ordinate system there) varies with one co-ordinate of the input (given likewise), while keeping all other co-ordinates of the input fixed. From this it becomes possible to synthesize a linear map from inputs to outputs. However, in general, a derivative (of a function between linear spaces) is indeed a linear map from inputs to outputs; and it is entirely possible to introduce the subject in such terms, without going via any co-ordinate systems or giving primacy to the real derivative. In my opinion the result is indeed clearer, for all that it requires an understanding of linear spaces, by virtue of making clear the distinction between input space, output space and (gradient as) linear map from the former to the latter.

This requires, of course, the theory of linear spaces; in particular, we'll need to be able to infer a gradient for any chord, between inputs at the vertices of a voluminous simplex, that maps the vectors along each edge of the simplex to the exact differences between the outputs at the edge's ends; and we'll need a notion of scale of difference by which to demand that differences between these linear maps shrink significantly faster than the scale of a simplex in whose interior our speficied input lies. The latter can certainly be done by the use of hermitian forms; I suspect it can also be done by judicious use of the scaling properties of simplices, at least in the case of spaces linear over (some sub-field of) the real (rather than complex) numbers. The gradient replaces the real analysis's ratio of change in input to change in output; the notion of scale quantifies the sense in which the derivative provides sufficiently accurate estimates of the changes in output.

Chords

When differentiation is introduced in terms of functions from reals to reals, we divide a (possibly zero) change in output by a (definitely non-zero, but typically small) change in input to get a gradient or rate of change between two evaluations of the function; on a graph, this is represented by a straight line between two points on the curve representing the function; this is a chord of the curve. The slope of the chord gives us our gradient. All chords reasonably close to a given point on the curve have, provided the curve is differentiable there, gradients reasonably close to one another and to that of the tangent to the curve at our given point. If we multiply a gradient by the change in input along a chord, we get a tolerably good estimate of the change in output between its end-points.

So, what can we multiply by displacements near our point, to estimate changes in output of a function evaluated at both ends of the displacement, when the displacements aren't necessarily all in the same direction ? The answer is, of course, a linear map; and a linear map is determined by its value on a basis of its input space; so we need as many linearly independent displacements as the dimension of our input space. That means we need to evaluate our function not just at two points but, when the input space had dimension n, at 1+n points. As long as the displacements from one of these points to each of the rest are linearly independent, they form a basis; which has a dual, each member of which we can tensor multiply by the change in output between the end-points of the corresponding displacement; summing over the whole basis, we get a linear map which maps each displacement from the given one point to any of the others to exactly the change in our function's value between the two points in question. This linear map, then, is our estimate of the derivative, derived from a set of points, presumed to be near the input at which we sought a derivative.

Indeed, any 1+n points in an n-dimensional input space suffice, provided the displacements from any one to each of the others are linearly independent, to determine a linear map that we can think of as the gradient of the function among those points; so this set of 1+n points serves in place of the two end-points of our chord, in the n = 1 case, and the linear map they induce corresponds to the slope of the chord. We thus generalise our 1-dimensional chord, with two ends, to an n-dimensional chord, with 1+n vertices, that suffices to determine a gradient of our function. We can then contrive one mechanism or another for finding what gradient we get in the limit as the 1+n points draw in on the point at which we were trying to determine a derivative.

Simplices

Now, we have 1+n points in an n-dimensional linear space; this is exactly what it takes to define a simplex, which generalises point = 0-simplex, line = 1-simplex, triangle = 2-simplex, tetrahedron = 3-simplex. For each natural n, there's a canonical n-simplex

S(n) = {mapping ({positives}: p :1+n): sum(p) = 1}

(this even gives ua an S(−1) = {}, the empty set of empty lists whose sum is 1). Note that ({positives}: p :1+n) only lets p(i) be meaningful for i in 1+n, but doesn't guarantee every i in 1+n; the i in 1+n for which there is no p(i) can be treated as if p(i) were 0 (which isn't positive). We can express any 1+n points in an n-dimensional space N as a list (N: v |1+n); the simplex which has the v(i) as its vertices is just {sum(p.v): p in S(n)}, where sum(p.v) = sum(: p(i).v(i) ←i :); each v(i) arises from a ({positive}: p :1+n) that's actually 1←i, with p(i) = 1 and no p(j) defined other than i = j. Any p for which (:p|) isn't all of 1+n ignores at least one of the v(j), for some j for which there is no p(j), and thus is on a face of our simlex, which ensures that there are points outside our simplex as close to sum(p.v) as we care to ask for; such a sum(p.v) is on the boundary of our simplex.

Now, I mentioned the displacements from any one of our vertices to each of the others, that I want to have linearly independent. Let's pause to notice it doesn't matter which of those vertices we fix as start-point for all those displacements. Given ({N: v |1+n), we can infer e(i) = (: v(j) −v(i) ←j :1+n) and restrict it so that e(i, j) is only defined for j and i distinct (e(i, i) would just be zero, after all); each e(i) is then a partial list of n displacements in our space N. Now, a little arithmetic reveals that we can obtain any e(j) from e(i) as e(j, i) = −e(i, j) while, for all k other than i, j, e(j, k) = e(i, k) −e(i, j). One entry in e(j) is a non-zero-scaled version of an entry in e(i); all the other entries in e(j) are just the remaining entries in e(i) displaced by a multiple of that one entry; in such a case, e(j) is linearly independent precisely if e(i) is. So the displacements out of one vertex, to each of the others, are linearly independent, or not, independent of which one vertex we chose to start from. It is thus convenient to use e(n), which I'll now refer to simply as E = (: e(i) −e(n) ←i |n).

Interior

I'm fairly sure this is only (directly) applicable to the case of linear spaces over (some sub-field of) the reals, although there may be some way to salvage it for the complex numbers. At the very least, one can do the real analysis on a linear space over complex, construed as the twice-dimensional linear space over reals, then reason about whether the inferred real-linear derivative does represent a complex-linear map. None the less, I'll pose the following in terms of hermitian forms; if we're limited to reals, with their fatuous conjugate, these shall simply reduce to quadratic forms.

So, back to our simplex {sum(p.v): p in S(n)} with sum(p.v) on the boundary unless ({positives}: p |1+n). All displacements within our simplex are in the span of E; if this isn't the whole of our linear space N, there is some direction u in N for which, for every sum(p.v) in our simplex, sum(p.v) +t.u isn't in our simplex for any non-zero scalar t; as a result, there are points outside our simplex arbitrarily close to any point of our simplex; the whole simplex is thus boundary; it has no interior. Conversely, if the n vectors in E do span our n-dimensional N, they must be linearly independent and form a basis; consider any x = sum(p.v) with p(i) positive for all i in 1+n; as n is finite and each p(i) is positive, there is a positive q, the least entry in p, for which any sum(r.v), with sum(r) = 1 and every r(i) within q of p(i), lies inside the simplex. We can define the obvious positive-definite hermitian form derived from our basis E by summing the tensor-squares of the members of its dual; every point within q of x in the induced distance of this hermitian form lies within our simplex. For any positive-definite hermitian form on N, we can now simultaneously diagnolise this and our E-derived hermitian form, with the former unit-diagonalized; since E is a basis, the E-derived form is positive-definite too, so its diagonal entries are all positive; select a positive r for which r.r is not more than any diagonal entry of the E-derived form, when the other is unit-diagonalised; then every point of N within q/r of x, in the distance induced from the other hermitian form, is within q of x in the E-derived one, hence inside the simplex. Consequently, for the topology induced from the distance determined by any positive-definite hermitian form on N, sum(p.v) is in the interior of our simplex. So our simplex has an interior precisely if its edges, out of any one vertex, are linearly independent; and that interior consists exactly of those sum(p.v) for which (:p|) is 1+n.

Note that this sense of interior can be defined entirely in terms of simplices and linear independence, without reference to orthodox topology or hermitian forms; the reasoning above merely establishes that the notion of interior that it gives shares enough, for my purposes, in common with those that arise from orthodoxy.

I describe a simplex with an interior, in this sense, as voluminous (since it does indeed have a (non-zero) volume, as determined by any metric induced from a positive-definite hermitian form). I describe such a simplex as being about any poitnt in its interior.

Each simplex about a point x provides us with a chord of any function defined throughout that simplex; and that chord supplies us with a gradient for the function, between the simplex's vertices; discussion of differentiability then comes down to whether all sufficiently small simplices about x give chord-gradients sufficiently close to some particular gradient to allow us to accept that gradient as the derivative of the function at x. I'll come back to a more orthodox approach to this, but first let me sketch how we can do it, at least for linear spaces over an ordered field, using nothing but simplices: we need them for the chords, so why not use them also for the notion of nearness ?

If the complex case can be salvaged, it'll be by using a 2.n-simplex in the complex n-dimensional space, since it's a 2.n-dimensional space over reals; and the edges of this simplex must real-span the input space for the simplex to be voluminous. This provides the simplices whose interiors do the work of topology; for chords, we'll still be using lists of 1+n points, whose differences span the complex space.

Using only simplices

Since we naturally involve simplices in order to get chords across which to obtain gradients, the question naturally arises: do we need anything else ? At least for the (sub-field of) real case, it turns out we don't; this might generalise to the complex case if the last subsection's reservations can be overcome.

Suppose we have some sub-ringlet R of {reals}, whose positives are dense in those of reals; and a (not necessarily linear) mapping (M: f |N) from n-dimensional R-linear space N to R-linear space M. Let {positives} be the collection of R's positive values.

Given a simplex, characterised by vertex list (N: v |1+n), about a point x, we can rescale the simplex towards or away from x simply by replacing v with (: x +t.(v(i) −x) ←i |1+n) for some positive t. The ({positives}: p |1+n) in S(n), so sum(p) = 1, for which x = sum(p.v) then has sum(: p(i).(x +t.(v(i) −x)) ←i |1+n) = sum(p).x +t.(sum(p.v) −sum(p).x) = x since sum(p).x = x = sum(p.v). Thus the same p that makes x an interior point of v's simplex also gets x as its interior point for each rescaled x +t.(v −x) simplex.

Since the space of linear maps (M: |N) is an R-linear space, just like M and N, we can (at least when M and N are finite-dimensional) form simplices in this space of linear maps and ask whether, for any simplex G about a putative derivative of f at x, there is some simplex I about x for which every simplex about x whose vertices lie in I has chord-gradient inside the specified simplex G. We can, furthermore, ask whether there are some natural h and positive T for which, for every positive t < T and every voluminous simplex about x within I power(h+1, t)-scaled towards x, the chord-gradient of this simplex lies within the power(h+1, t)-scaled version of G. If such h and T exist for any simplices I about x and G about the putative gradient, we can definitely accept the given putative gradient as the derivative of f at x.

Alternatively, given (M: f :N) and a putative gradient f'(x) for it at x, we can define err(u) = f(u) −f(x) −f'(x)·(u −x) as the error in extrapolating f from x to u and ask whether, for some simplex K about 0 in M and H about 0 in N, err(u) lies within K whenever u−x is in H; and, as before, we can ask whether this remains true when we scale K towards zero faster than we scale H towards zero. If we can, for any H and K, then we can confidently accept f'(x) as a derivative for f at x.

By such means, one may define differentiability of f at x in N, with derivative linear from N to M. The special case where N and M are R then simply exploits the usual isomorhpism between {R-linear (R: |R)} and R itself.

In particular, we can recast this by taking an intersection. A first naïve approach would be to intersect the collections {gradient(S, f): simplex S, which lies within B} over all simplices B about p, with gradient(S, f) being the linear map we get from using the vertices of S to sample the values of f; we might hope for the intersection to be {f'(p)}. However, it is easy enough to find cases where the intersection is empty yet it really makes sense for f to be differentiable: for example, consider cube'(0), where cube is (: power(3) :{reals}); every chord of cube has gradient (power(3, u) −power(3, v))/(v −u) for some reals u, v; this is equal to both u.u +u.v +v.v and power(2, u +v) −u.v; when one of u, v is positive and the other is negative, the latter is necessarily positive; otherwise, the former is necessarily positive unless u = 0 = v; so distinct u, v give a positive gradient for their chord. The only value cube'(0) could have is 0, but no chord actually delivers it; although short enough chords about 0 do give gradients arbitrarily close above zero. Orthodoxy would formalise this by taking the closure of each of the sets intersected; which, with simplices, we can do by including the boundaries of simplices and constructing, for function (U: f :V) defined throughout some simplex F ⊂ V about some point p of the linear space V, with U also linear:

slope(f, p) = intersect({simplices in linear(U: |V)}}: S ←B; S subsumes {gradient(R, f): simplex R within B} :{simplices about p in V})

where, for purposes of the relation intersected, the simplex S is a set of linear maps (U: |V), delineated by its vertices. Since each simplex S (in {linear maps (U: |V)}) includes its boundary, we will pick up a gradient that's a limiting value, like 0 = cube'(0), and since we take an intersection, we avoid including more than this needs. If slope(f, p) has exactly one member, we describe f as differentiable at p, with that one member as the derivative of f at p.

Using hermitian forms

Now, since I'm not sure I can make that work for complex vector spaces, let's look at how orthodoxy can use simplices to do its usual limit process to define differentiation in the linear spaces themselves, rather than in terms of components and differentiation of functions from the reals to itself.

This can be done essentially the same way as the second alternative above, but using the unit balls of hermitian forms instead of simplices about the origins in the input and output space. As before, in an n-dimensional input space, we can get a chord off any n-simplex whose edges span the input space (I'm just not so sure we get a sensible notion of interior out of such a simplex, in the complex case). As before, we can define err(u) = f(u)−(f(x) −f'(x)·(u−x) for u in N, using the putative derivative for f at x as f'(x).

If we have any hermitian forms on N and M for which err(u) lies in M's power(h+2, t)-radius ball whenever u−x lise in N's power(h+1, t)-radius ball, for all positive t < some positive T, then we can accept the f'(x) used to compute err(u) as a derivative for f at x.

Written by Eddy.