Computing the Fibonacci numbers

This page is really about how to tell a computer what to do in such a way that it goes about it efficiently; so it has to involve some mathematics (so that we can work out how much work the computer has to do to complete the computation), so it may as well address an entirely mathematical problem.

Fibonacci's sequence is defined by:

f(1) = 1 = f(2)
f(i+2) = f(i+1) + f(i)

Formally, this only specifies f(i) in so far as i is a whole number (i.e. one with no fractional part); but it does specify f(i) for every whole number. We can work forwards to see that f(3) = 2, f(4) = 3, f(5) = 5, f(6) = 8, f(7) = 13, f(8) = 21 and so on. We can turn the second rule around to say f(i) = f(i+2) −f(i+1) and work backwards to get f(0) = 0, f(−1) = 1, f(−2) = −1, f(−3) = 2, f(−4) = −3, f(−5) = 5, f(−6) = −8 and so on. None of this gives us any clue as to a value for f(1/2) or f(π), so Fibonacci's specification only tells us f's values at whole numbers (positive, zero and negative; collectively known as integers).

Notice that f(0) = 0, f(1) = 1 would work as an entirely adequate replacement for the first rule (and I could equally phrase it as (:f|2) = 2). Indeed, specifying f at any two distinct whole numbers would suffice to determine it at every whole number. Also, observe that f(−odd) = f(odd) but f(−even) = −f(even) for each odd or even input explored above; this is easy enough to prove inductively.

For this page's purposes, I'm going to use python as the programming language in my examples: however, the reader is welcome to read the code as pseudo-code (and I'll aim to restrict myself to python that reads well as such), understanding indented lines as being subordinates controlled by the last less-indented line. The key-word def begins the definition of a function.

The inaccurate way

It's possible to prove that any sequence f satisfying the second rule above is, for some k and h, f(n) = k.power(n, g) +h.power(−n, −g) where g = (1 +√5)/2 is known as the golden ratio, for which g.g = 1 +g, hence 1/g = g −1. The first rule above (in its shifted form giving f(0) = 0, f(1) = 1) then implies 0 = k +h, 1 = k.(g −1/(−g)) = k.(2.g −1) = k.√5, whence

f(n) = (power(n, g) −power(−n, −g))/√5.

This even lets us extend f to values other than whole numbers by interpreting power(i, q) as exp(i.log(q)), albeit the fact that (1−√5)/2 < 0 leads to its non-whole powers being multivalued (because log(−1) is any odd multiple of π.√−1); so fib becomes a relation rather than a function, when extended beyond the integers.

If I'm happy with the precision of floating point arithmetic this lets me compute f very efficiently; however, floating point arithmetic is all about adequate approximations and tends to produce mildly embarrasing results like 4 × 0.035 = 0.14000000000000001 (when it's easy to see that the right answer is 0.14 exactly) – indeed, using it to evaluate the formula (using IEEE floating-point arithmetic, with the currently usual double-precision 64-bit floating point type, which has 53 bits of mantissa) gives me f(4) as 3.0000000000000004 and f(−4) as −3.0000000000000004, with similar errors for bigger values on each side of zero. Worse yet, if I express the second term as −(−1/g)**n rather than −(−g)**(−n) in defining the function, f(−2) comes out as −1.0000000000000002; little details in how the same function is coded can matter.

>>> g = .5 * (1 +5**.5)
>>> g * (1 -g)
−1.0000000000000002
>>> def fib(i): return (g**i −(−g)**(−i)) / 5**.5
...
>>> def f(i): return (g**i −(−1/g)**i) / 5**.5
... 
>>> fib(4)
3.0000000000000004
>>> f(−2)
−1.0000000000000002
>>> fib(−2)
−1.0
>>> fib(−3)
2.0
>>> fib(−4)
−3.0000000000000004

I can of course fix that by rounding to the nearest integer (the python function int() rounds its input down, so adding .5 to the input rounds to nearest, give or take handling of exact halves), but even that'll break down round about f(77), which is bigger than power(53, 2), so when the errors are comparable to the differences between whole numbers. Indeed, the first error shows up input 71, not equal to the sum of its predecessors (and, in particular, getting an even answer, which should only happen at inputs that are multiples of 3):

>>> int(fib(69) +.5), int(fib(70) +.5), int(fib(71) +.5)
(117669030460994, 190392490709135, 308061521170130)
>>> int(fib(69) +.5) + int(fib(70) +.5)
308061521170129

The first rule gives f(1) and f(2) as whole numbers (they're both 1, which is as obviously whole as a number can be) and the second rule gives any other f value by adding or subtracting f values for inputs closer to 1 and 2 than the new one we're obtaining, so we can safely induce that all f values are whole numbers, with no fractional part; which makes it silly to compute them via floating point numbers (i.e. ones with a fractional part). That those numbers in fact represent (using diadic rational approximations) irrationals doesn't make it any nicer. So I'll be looking at how we can compute f's values precisely and ignore the short-cut via this nice piece of theory. None the less, notice that the value of f(i) grows exponentially with i's magnitude.

The really obvious way

Most modern programming languages (for these purposes, ForTran 77 isn't modern, although it does have more modern dialects) allow one to write functions which call themselves; this is called recursion. A language can be clever enough to dodge the problem I'm about to point out, but it's instructive to see why simple-minded languages run into problems with the obvious way to implement Fibonacci's sequence. We see how it was originally specified, so we can write ourselves a function:

def Fibspec(i):
	if i == 1 or i == 2: return 1
	if i > 1: return Fibspec(i −2) +Fibspec(i −1)
	return Fibspec(i +2) −Fibspec(i +1)

This simply states the rules given above: the first rule is the first line, f(1) = 1 = f(2) (the == operator tests for equality, as distinct from the = operator, assignment, which gives its left operand(s) the value(s) its right operand(s) previously had). The second line is just the second rule; and the third line is the reverse of it, needed to get answers for i < 1.

Unless the programming language you're using is very clever, it's going to do an awful lot of computation, if you use a definition like this. Let me explain why. First, observe that if I ask for Fibspec(1) or Fibspec(2), I get my answer back immediately: it's 1. Thus far, Fibspec is doing well. Next, if I ask for Fibspec(3) it'll test that 3 isn't 1 or 2, notice that 3 > 1 and proceed to work out (as its answer) Fibspec(1) +Fibspec(2) – each of which it works out directly; so (aside from some over-head, which I'll ignore, in how the language lets a function call itself recursively) it has to do one addition to get the answer 2. (It also has to do two subtractions; indeed, the number of subtractions involved in computing f(i), for i > 0, is exactly twice the number of additions; but this – along with the two recursive function calls per addition – just means the total work involved is a simple constant times the work for one addition; so let's stick to counting the additions.) If I ask for Fibspec(4), it'll likewise do one addition to combine Fibspec(2) with Fibspec(3); computing the latter costs it one addition, as before, so it costs two additions to get the answer Fibspec(4) = 3. Notice that the number of additions we've needed to do, each time, is just one less than the answer we obtained. For Fibspec(5), we work out the previous, costing us 2 +1 additions, and do one more addition, getting the answer 5 from 4 additions. Again, the number of additions is only one less than the answer.

In fact, if we can compute f(i) and f(i+1) at a cost of one addition less than their respective values, then computing f(1+2) costs us one addition to add them together plus f(i)−1 to compute f(i) and f(i+1)−1 to compute f(i+1); for a total of 1 +f(i)−1 +f(i+1)−1 = f(i) +f(i+1) −1 = f(i+2)−1. So the number of additions we have to do to compute Fibspec(i) is going to be only one less than the answer we get. It shouldn't be hard to see that similar applies for i < 1; computing Fibspec(i) requires slightly more (rather than less) subtractions (plus twice as many additions and recursive function calls) than the magnitude of the answer, in this case. Given that this answer grows exponentially with the magnitude of i, this means that Fibspec is a horribly expensive way to compute the Fibonacci numbers.

The obviously better way

So let's look for a better way to compute f(i). If you look at what we were doing before, you should be able to see that the real problem is that, when computing f(i+2), we first compute f(i), then compute f(i+1); but, in the course of doing the latter, we re-compute f(i). That's obviously dumb: if we'd just remembered it from before, we could save that extra computation. Since the same applies at each step back towards 1 and 2, we're clearly better off simply working forwards:

def Fibiter(i):
	if −1 < i < 1: return 0
	old, ans = 0, 1
	while i > 1:
		old, ans = ans, ans +old
		i = i −1

	while i < −1:
		old, ans = ans, old −ans
		i = i +1

	return ans

Note that python allows several operands on each side of assignment; old, ans = 0, 1 sets old to 0 and ans to 1; and old, ans = ans, ans + old computes the two values on the right before doing the assignments, so ans gets the prior value of ans + old and old gets the prior value of ans. Notice, also, that I've thrown in a piece of robustness for the computation: if i isn't a whole number, the function shall still return an answer (rather than, say, spinning forever in a loop that's been confused by an unexpected value); and that answer is what you would get by calling Fibiter with a whole number close to i.

This time, we're just keeping track of two values – initially old = f(0) and ans = f(±1) – and working our way up or down to i one step at a time, doing one addition and one subtraction on each step, so it should be clear that the number of additions and subtractions we have to do is equal to the magnitude of i, give or take one. So, with a little easy thinking, we've now got a way to compute f(i) at cost proportional to i rather than proportional to f(i); given that the latter grows much faster than i, this is clearly an improvement. (Furthermore, although this needn't be true in all languages, calling functions (recursively or otherwise) is more expensive, in imperative languages, than iterating a loop; so not only do we make fewer steps of computation, but the cost of each step is reduced, too. We incur some small overhead in having two new local variables, so computing f(i) for a few small positive i will be minutely slower, but f(0) is now very cheap and all other f(i) shall be cheaper to compute.)

The interested reader would do well to think for a while, before reading on, about how it might be possible to do even better than this.

The deviously clever way

Some clever person noticed (many years ago) something marvelously subtle that we can exploit. However, before I can explain how that works, I need to introduce you to the efficient way to compute powers, given a multiplication. While you can compute a power by repeated multiplication

def powiter(n, x):
	if n < 0: return powiter(−n, 1./x)
	ans = 1
	while n > 0:
		ans = ans * x
		n = n −1
	return ans

(in which the number of multiplications we do (and, incidentally, the number of subtractions we do) is equal to the power we're raising to), there's a much more efficient way to do it, in which the number of multiplications we do is just the number of binary digits it takes to write out n plus the number of non-zero digits among these (so at most twice the logarithm of n to base two). Once you understand that, I'll be ready to show you how to compute Fibonacci more efficiently.

Note that, if we're happy to use floating point arithmetic, we could also obtain power(n, x) as exp(n * log(x)), whose cost doesn't depend on n at all – although the costs of calling exp and log run to quite a lot of multiplications, divisions, additions and subtractions; so this only works out cheaper when n is quite big. So, for a (more-or-less) fixed large cost, we can compute power, hence f(i) via the golden ratio approach, at cost independent of i, provided we're happy with floating point. However, as noted above (where I used python's built-in power operator, g**n being what I write as power(n, g), and this built-in probably does indeed use exp and log in this way), I'm trying to avoid floating point – which, for large values of i, is apt to leave us having to round a big floating-point value to its nearest integer, which becomes imprecise once the output of f(i) is big enough that the floating point representation's mantissa has too few digits to represent it exactly.

Efficient power

def powbits(n, x):
	if n < 0: n, x = −n, 1./x
	ans = 1
	while n > 0:
		if n % 2: # n is odd
			ans = ans * x
			n = n −1
		n = n // 2
		x = x * x
	return ans

I encourage the interested reader to think about why this is correct before reading on. The % operator computes a remainder: p − (p % q) is always a multiple of q; so n % 2 is one (which if considers to be true) when n is odd and zero (which if considers false) when n is even. The // operator is whole-number division, rounding down (for positive right operand (denominator); it rounds up for a negative one) to always produce a whole-number result. The two are related by a % b +b * (a // b) == a.

We can use this with floating point arithmetic, just as we can for whole number arithmetic; and, for modest values of n, it'll be faster than exp(n * log(x)), though with the same precision issues as I note above. As we'll soon see, however, this isn't faster than we can compute f(n); only via exp and log can floating point win, and then only for large n, at which point the precision problems are apt to matter. So, for precise computations, what I'll show soon is in fact as fast as the floating point approach, except possibly for a bounded range of values of i that are large enough for exp(i * log(x)) to be cheaper than powbits(i, x) but small enough that f(i) is faithfully represented by a floating point mantissa.

What powbits does is to express n in binary and multiply ans by power(b, x) for each power of two, b, that's represented by a non-zero digit of n, when written in binary. Computing the power(b, x) is done by repeatedly squaring x. To help you understand how powbits works, consider this recursive form of it:

def powrecurse(n, x):
	if n < 0: return powrecurse(−n, 1./x)
	if n % 2: return x * powrecurse((n −1) // 2, x * x)
	if n > 0: return powrecurse(n // 2, x * x)
	return 1

The first line is as before. If n is zero, we'll fall all the way through to the end and get 1, which is just power(n, x) with n = 0. If n is even and positive, we get to the last but one line: power(n, x) is, in this case, just power(n/2, x*x), so that's what we return. For positive odd n, n−1 is even, so power(n−1, x) is just power((n−1)/2, x*x) and multiplying it by x yields power(n, x). So I hope you can see that powrecurse does compute power(n, x).

In a really clever programming language, powrecurse might even be compiled efficiently: but most imperative languages aren't that clever, so let's see why powbits is doing the same thing as powerecurse. Each time we go round the loop in powbits is equivalent to one layer of recursion in a call to powrecurse: the first time round the loop, we have our original x; if n is odd, we multiply ans by x and leave the subsequent iterations to multiply ans by pow((n−1)/2, x*x); otherwise, we simply leave those subsequent iterations to multiply ans by pow(n/2, x*x). By the time we've fallen off the end of the loop, we've worked out what we were originally asked for, but we've only gone round the loop once per binary digit of n (and we've done it without relying on our language to be clever enough to implement tail recursion with an accumulator). Before I move on, let me just throw in how I'd actually implement powbits in python:

from operator import mul, truediv
def powbits(n, x, v=1):
	"""Returns v * x**n, for integer n; v defaults to 1."""
	if n < 0: n, join = −n, truediv
	else: join = mul
	while n > 0:
		n, b = divmod(n, 2)
		if b: v = join(v, x)
		x *= x
	return v

Since we're going to want a variable in which to accumulate the answer, incrementally multiplying by powers of x, and the caller not infrequently wants to multiply the answer by something, we may as well let the caller pass that to the function, in place of the 1 we'll use by default; hence parameter v replaces local variable ans. This also makes powbits usable when x isn't a plain number, e.g. when it's a matrix, and might not support multiplication by plain numeric 1, requiring instead a suitable unit of its own kind, that we can pass in as v. In support of that, the negative power case is here handled by dividing v by x instead of by replacing x by its inverse, mediated by over-riding the mul function – the operator module's mul(v, x) just returns v * x but we replace this with truediv(v, x) = v / x for negative n (and truediv uses the division for which 1/2 is 0.5 instead of rounding to 0 because 1 and 2 were given as integers). Furthermore, this even lets us use powbits(n, p, q) with negative n to divide q repeatedly by p, even when p has no multiplicative inverse, provided q actually has p as a sufficiently repeated factor (and their type implements division to recognise that case; as, for example, my study.maths.polynomial.Polynomial does).

The function divmod (a built-in of the language) returns a pair of values, the quotient and remainder from performing a division; so n, b = divmod(n, 2) halves n, rounding down if n was odd, and sets b to the remainder it got when doing so. So this version does the same as the previous one, but is a little terser and more flexible … and that first line, in triple-double-quotes, is documentation – because real code should always be documented.

Implementations of power() commonly also accept an optional modulus parameter; when it is supplied, the arithmetic is performed modulo it. In such an implementation, it can arise that the repeated squaring of the value being raised to a power, x above, can become equal to the multiplicative identity of its type; in particular, when raising a whole number to a power modulo a prime, this is naturally prone to happening. So an implementation might well want to exit its loop early if that happens. In our case, however, the value we'll be taking a power of doesn't have any power that's an identity.

Fast Fibonacci

So now let's go back to looking at Fibonacci. In Fibiter, we have two variables, old and ans, that hold a pair of values of the Fibonacci sequence; at each iteration, in either loop, we compute new values for them from (only) their prior values. That computation is very simple: old gets ans's prior value and ans gets the sum or difference of their prior values. If we used a vector, with two entries, to hold these two variables, each step of the iteration would thus be performing a simple linear transformation on the vector: this can be expressed as the contraction of a matrix with the vector. Since each iteration contracts the same matrix with the vector, overall we're simply contracting power(i) of the appropriate matrix with the start vector, [0,1]. So the loops in Fibiter are like the loop in powiter, but applying a matrix to a vector instead of multiplying numbers; and the net effect is to compute a power of a matrix and contract it with the vector. If we can compute the power of the matrix in powbits's way, we can reduce the cost from of order the input to Fibiter to of order the number of bits needed to express it – i.e., its logarithm (to base two).

Now, the matrix for our forward iteration looks like F = [[0,1],[1,1]], with the second pair as the bottom row; contracting it with [old,ans], with ans below old, gives [ans,old+ans] as the new value for [old,ans]. When we multiply F by itself repeatedly, we get F·F = [[1,1],[1,2]], F·F·F = [[1,2],[2,3]], F·F·F·F = [[2,3],[3,5]]; in fact (as the interested reader should be able to verify easily) we get power(i, F) = [[f(i−1),f(i)],[f(i),f(i+1)]]; so we can compute f(i) as the second co-ordinate of power(i−1, F)·[0,1], for positive i. Indeed, python will actually let me define a class that implements matrix multiplication such that I can simply use powbits to implement this, with similar for F's inverse to handle negative i.

This sort of transformation, representing a function's internal state by a vector and each step of an iteration by a linear map applied to that vector, so as to replace the iteration by some more efficient computation derived from the linear map, can be applied to reasonably many contexts, making it worth checking one's code for opportunities to exploit such a transformation. Notice, however, that (in the present example) power(i, F) contains some redundancy: two of its entries are equal and one of the others is the sum of the last with one of the equal ones. We can surely do (a bit) better.

Remember earlier, where we used a solution to g.g = g +1 to obtain solutions of form f(n) = k.power(n, g) +h.power(−n, −g); so let's look at polynomials modulo this equation, i.e. with square = power(2) = (: x.x ←x :) treated as if it were equal to (: x +1 ←x :). This reduces the space of polynomials to a two-dimensional space; each so-reduced polynomial is simply (: A.x +B ←x :) for some A, B. If we multiply it by power(1) = (: x ←x :), it becomes (: A.x.x +B.x = (A +B).x +A ←x :), so the pair of coefficients is subjected to exactly the transformation Fibiter applied to ans and old, with ans as A and old as B. Thus power(0) = (: 1 ←x :) represents a start-state, slightly different from that of Fibiter, with old = 1 and ans = 0; and each successive power(i, x) reduces to f(i).x +f(i−1). Thus taking powers of x in this space of reduced polynomials is equivalent to computing Fibonacci's sequence.

The reduced space of polynomials represents our iteration in a way I can easily motivate from the polynomial that appears in the analytic treatment of the sequence; that works generally for similar iterations. However, we're lucky in the details of how we use it to encode our iteration. In other cases, such as the Perrin iterator, the corresponding analysis can require more thought; none the less, the essential treatment of the problem is the same. Reduce polynomials modulo the one that mimics our iteration, then look at the resulting multiplication to find a way to represent tuples of iterated values and a polynomial to multipliy by to implement the iteration.

When we multiply two of our reduced polynomials, the cross-terms in x.x reduce away; (A.x +B).(C.x +D) = A.C.x.x +(B.C +A.D).x +B.D = (B.C +A.D +A.C).x +(A.C +B.D). The code doesn't actually need to know we're working with polynomials; it just sees us working with pairs of numbers and using an eccentric multiplication on such pairs: [A, B].[C, D] = [A.D +A.C +B.C, A.C +B.D]. The pairs [0, 1] and [1, 0] represent power(0) and power(1), respectively; the former is our multiplicative identity and the successive powers of the latter shall give us Fibonacci's sequence. We just need to code the equivalent of powiter for the multiplication of our reduced polynomials. So first we define (using python tuples (a, b) rather than lists [A, B], simply because the language provides convenient support for these):

def fibtimes(p, q):
	(a, b), (c, d) = p, q # unpack as pairs
	return a * (c +d) +b * c, a * c +b * d

which is just our peculiar multiplication. (We could, equally, subclass the built-in tuple type in python, of which (a, b) and (c, d) are instances, and define the methods on it to support this as multiplication in the ordinary way, rather than having to call fibtimes explicitly; we could then use powbits, above, rather than a custom implementation of power.) Then we implement the power operation for this multiplication restricted to only computing powers of (1, 0) (i.e. power(1) or x ←x), as

def Fibpow(i):
	(a, b) = (1, 0) # x
	(c, d) = (0, 1) # ans
	while i > 0:
		i, n = divmod(i, 2)
		if n: c, d = fibtimes((a,b), (c,d))
		(a, b) = fibtimes((a,b), (a,b))
	return (c, d) # x**i

We can actually do a little better than computing Fibpow, extracting c from the pair it returns and munging to handle sign; since we only need one part of its return in any case, we can simply return that; and we can handle negative exponents by exploiting a multiplicative inverse for (: x ←x :), namely (: x−1 ←x :), since x.(x−1) = x.x −x which is equivalent to x+1 −x = 1. So we can use [1, −1] as (: 1/x ←x :), in place of [1, 0] as (: x ←x :), and flip the sign of i, if negative. At the same time, we can avoid squaring (a, b) the last time round the loop (where its value shalln't actually be used again) by moving the rest of the loop after it and unwinding a copy of it out into the preamble, where we can simplify it:

def Fibfast(i):
	if i < 0: a, b, i = 1, −1, −i
	else: a, b = 1, 0

	i, n = divmod(i, 2)
	if n: c, d = a, b
	else: c, d = 0, 1

	while i > 0:
		a, b = fibtimes((a,b), (a,b))
		i, n = divmod(i, 2)
		if n: c, d = fibtimes((a,b), (c,d))
	return c

(Alternatively, the unrolled first iteration could initialize slightly differently and record which of c, d to return at the end – details I leave as an exercise for the interested reader.) Since we've now reduced our computation to an analogue of power, the amount of computation needed to compute f(i) only grows, with i, as fast as the number of digits it takes to write i out in binary (i.e. the logarithm of i to base two), which grows much less rapidly than i. We are doing rather more computation at each of these fewer steps (rather than just one addition and one subtraction, we're doing: either four multplications, three additions and some easy binary stuff (taking remainder by two and dividing by two is really cheap); or eight multiplications, six additions and the cheap binary stuff), so the Fibiter solution is faster for small values of i; but the Fibfast solution wins for larger values of i.

If you're always computing answers for small values of i, and doing so often enough that you care about performance, Fibiter is easily beaten by an implementation which remembers the answers it's worked out in the past; which can be just as efficiently implemented, using a variant of Fibspec which caches its answers, as by using Fibiter. In python, that would be written:

def Fibcache(i, cache=[0, 1]):
	if i < 0:
		return Fibcache(−i) if i % 2 else −Fibcache(−i)

	try: ans = cache[i]
	except IndexError:
		assert i > 1, "cache started with two entries"
		ans = Fibcache(i −2) +Fibcache(i −1)
		assert len(cache) == i
		cache.append(ans) # ans is now cache[i]
	return ans

(One could also chose to limit the cache size; before the try stanza, check for i > BIG, for some suitably big limit on cache size, and fall back on Fibfast(i), optionally optimising to exploit the knowledge i > 0.) Aside from the special handling of negative i, some of the cleverer (usually functional) programming languages might even compile their equivalents of Fibspec to something along these lines. On the other hand, if you were only interested in f(i) for a sparse few large i, Fibfast wins (strongly); and it'd be asking a lot to expect the compiler to spot that it can re-arrange Fibspec, or anything even remotely resembling it, into that. In fact, if you compiler is that clever, you should probably (politely) invite it to take the Turing test.

Or in C++

One can, of course, do the same in any half-way decent programming language, so here's an illustration in C++:

#include <stdint.h>
#include <utility>

int64_t Fibspec(int8_t i) {
    if (i < 0) return Fibspec(i +2) −Fibspec(i +1);
    if (i > 1) return Fibspec(i −2) +Fibspec(i −1);
    return i;
}

int64_t Fibiter(int8_t i) {
    if (i == 0) return i;
    int64_t old = 0, ans = 1;
    while (i > 1) {
        old = std::exchange(ans, old +ans);
        −−i;
    }
    while (i < −1) {
        old = std::exchange(ans, old −ans);
        ++i;
    }
    return ans;
}

template <typename F = double, typename C = unsigned>
F powiter(C n, F x) {
    if (n < 0) return powiter(−n, F(1) / x);
    F ans = 1;
    while (n > 0) {
        ans *= x;
        n −= 1;
    }
    return ans;
}

template <typename F = double, typename C = unsigned>
F powbits(C n, F x, F v = F(1)) {
    if (n < 0) return powbits(−n, F(1) / x, v);
    while (n > 0) {
        if (n & 1)
            v *= x;
        n >>= 1;
        x *= x;
    }
    return v;
}

template <typename F = double, typename C = unsigned>
F powrecurse(C n, F x, F v = F(1)) {
    if (n < 0)
        return powrecurse(−n, F(1)/x, v);
    if (n > 0)
        return powrecurse(n >> 1, x * x, n & 1 ? x * v : v);
    return v;
}

Note that fib(93) will overflow int64_t, so even int8_t suffices to represent all the inputs we can sensibly support, in the absence of big-number support such as python provides by default. The pair-arithmetic begs to be rewritten in class-structured form as:

class FibPair {
    int64_t lin, con;
    FibPair(int64_t n, int64_t c) : lin(n), con(c) {}
    FibPair(int64_t c) : lin(0), con(c) {}
    FibPair(const FibPair &) = default;
    FibPair &operator=(const FibPair &) = default;
    FibPair operator*(FibPair other) const {
        return FibPair(lin * (other.lin + other.con) + con * other.lin,
                       lin * other.lin + con * other.con);
    }
    static FibPair power(unsigned n, FibPair x, FibPair v) {
        if (n == 0) return v;
        return power(n >> 1, x * x, n & 1 ? x * v : v);
    }
public:
    static int64_t FibFast(int8_t n) {
        FibPair x = n < 0 ? FibPair(1, −1) : FibPair(1, 0);
        return power(n < 0 ? −n : n, x, n & 1 ? x : FibPair(0, 1)).lin;
    }
};

providing FibPair::FibFast as the equivalent of Fibfast. It's also possible to write an equivalent of Fibcache, using a std::vector as cache in place of its list. Finally, here's an implementation of the floating-point solution, whose computational cost (nominally) doesn't increase with the input:

#include <math.h>
int64_t Fibfloat(int8_t n) {
    constexpr double lnRoot5 = log(5) / 2;
    constexpr double lnGolden = log(.5 +sqrt(1.25));
    const double lnRaised = n * lnGolden;
    const double p = exp(lnRaised − lnRoot5);
    const double q = exp(−lnRaised − lnRoot5);
    return llround(n & 1 ? p +q : p −q);
}

Written by Eddy.