Large Numbers
First page . . . Back to page 4 . . . Forward to page 6 . . . Last page (page 7)
Bowers' Array Notation (4element Subset)
In order to generalize his operators while also making it easy to extend the system further, Bowers created his array notation. The 3element version of Bowers array notation was already covered above. It is easy to convert from the extended operator notation to an array notation version of the same number — sacrificing a bit by making the rules for "expanding" a number from the array notation a little complex.
All of the operators defined thus far can be expressed as an array with up to four elements, as follows:
[a] = a
[a,b] = a+b
[a,b,1] = [a,b] = a+b = a^{①}b
[a,b,2] = a×b = a^{②}b
[a,b,3] = a^{b} = a^{③}b
[a,b,c] = a <c> b = hy(a,c,b)
[a,b,c,1] = a <c> b (combining a and b with the c^{th} operator from the added, multiplied, exponentiated, ... sequence)
[a,b,c,2] = a <<c>> b (combining a and b with the c^{th} operator from the expanded, multiexpanded, powerexpanded, ... sequence)
[a,b,c,3] = a <<<c>>> b (combining a and b with the c^{th} operator from the exploded, multiexploded, powerexploded, ... sequence)
Here are the rules:
1. For one and twoelement arrays, just add the elements. [a] = a
and [a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's,
remove them: [a,b,1,1] = [a,b];
[a,b,c,1] = [a,b,c], etc.
3. If neither previous rule applies,
and the 2nd entry is a 1, remove all but the first element:
[a,1,b,c] = [a] = a.
4. If none of the previous rules applies, and the 3rd entry is a 1:
[a,b,1,c] becomes [a,a,[a,b1,1,c],c1]
5. Otherwise all four elements are greater than 1: [a,b,c,d]
becomes [a,[a,b1,c,d],c1,d]
At this point, it is interesting to note how similar the Bowers array rules are to the definition of a recursive function like the Ackermann function. The Ackermann function was originally developed with the restriction that functions must be defined entirely in terms of calls to themselves and each other, and in terms of the "successor function" s(x) = x+1. The 5 rules above can be restated:
12. f(s(s(a)),1,1,1) = a
1b. f(s(s(a)),s(s(s(b))),1,1) = f(s(s(s(a))),s(s(b)),1,1)
3. f(s(s(a)),1,s(s(b)),s(s(c))) = a
4. f(s(s(a)),s(s(s(b))),1,s(s(s(c)))) = f(s(s(a)),s(s(a)),f(s(s(a)),s(s(b)),1,s(s(s(c)))),s(s(c)))
5. f(s(s(a)),s(s(s(b))),s(s(s(c))),s(s(d))) = f(1,f(1,s(s(b)),s(s(s(c))),s(s(d))),s(s(c)),s(s(d)))
Here is an example of applying the rules to the simplest nontrivial 4element array:
[2,2,2,2] = [2,[2,1,2,2],1,2] (by rule 5)
[2,1,2,2] = 2 (by rule 3)
so we have [2,2,1,2]
[2,2,1,2] = [2,2,[2,1,1,2],1] (by rule 4)
[2,1,1,2] = 2 (by rule 3)
so we have [2,2,2,1]
[2,2,2,1] = [2,2,2] (by rule 2}
[2,2,2] = [2,[2,1,2],1] (by rule 5)
[2,1,2] = 2 (by rule 3)
so we have [2,2,1]
[2,2,1] = [2,2] (by rule 2)
[2,2] = 2+2 = 4 (by rule 1)
With a little effort you can see that anything starting with [2,2 is equal to 4. To get anything bigger than 4, you have to have at least one 3. Here is the simplest example:
[3,2,1,2] = [3,3,[3,1,1,2],1] (by rule 4)
= [3,3,3,1] (because [3,1,1,2] = 3 by rule 3)
= [3,3,3] (by rule 2)
Once it is reduced to a 3element array, we can convert to hyper operator notation as established earlier. So [3,2,1,2] is 3^{③}3 = 3^{3} = 27. Now using the extended operator <<1>>, 3 <<1>> 2 = 3 {3} 3. This is the same as [3,2,1,2}.
Here is another example:
[3,2,2,2] = [3,[3,1,2,2],1,2] (by rule 5)
= [3,3,1,2] (because [3,1,2,2] = 3 by rule 3)
= [3,3,[3,2,1,2],1] (by rule 4)
= [3,3,[3,3,[3,1,1,2],1],1] (by rule 4 again)
= [3,3,[3,3,3,1],1] (because [3,1,1,2] = 3 by rule 3)
= [3,3,[3,3,3]] (by rule 2)
= [3,3,[3,[3,2,3],2]] (by rule 5)
= [3,3,[3,[3,[3,1,3],2],2]] (by rule 5)
= [3,3,[3,[3,3,2],2]] ([3,1,3] = 3 by rule 3)
= [3,3,[3,[3,[3,2,2],1],2]] (by rule 5)
= [3,3,[3,[3,[3,[3,1,2],1],1],2]] (by rule 5)
= [3,3,[3,[3,[3,3]],2]] (by rules 2 and 3)
= [3,3,[3,[3,6],2]] (by rule 1)
= [3,3,[3,9,2]] (by rule 1)
... (about 8 repeats of rules 5 and 1 to turn [3,9,2] into 27)
= [3,3,27] = 3 {27} 3 = 3^{㉗}3 = 3 {26} (3 {26} (3...))
This is equivalent to 3 <<2>> 2, which expands as follows:
3 <<2>> 2 = 3 <<1>> 3 = 3 <3 <3> 3> 3 = 3 <27> 3 = 3^{㉗}3
3 <<2>> 2 is the same as [3,2,2,2]. In fact, the rules for the 4element array notation are equivalent to definitions of the extended operators. The array [a,b,c,2] is equal to a <<c>> b; [a,b,c,3] is a <<<c>>> b; and in general [a,b,c,d] is a <<<<<c>>>>> b with d sets of brackets around the c.
in 2006 Chris Bird proved[45] that a 4element array [a,b,c,d] is larger than a→a→a→...→a→(b1)→(c1), using Conway's chained arrow notation, and with d occurrences of a in the chain (provided that a>2, b>1, c>0 and d>1). (Bird exhibited far more patience than I, who was merely satisfied with Bowers' own assessment that a 5element array [n,n,n,n,n] is at least as large as an nelement chain n→n→n→...→n.)
Bowers Arrays with 5 or More Elements
Of course, Bowers wanted to extend the system, so the rules were designed to work with arrays of arbitrary length. This is done by changing rules 4 and 5 to the following:
4. If none of the previous rules applies, and the 3rd entry is a 1: Define the variables a,b,S,d and R so that the array is [a,b,S,1,d,R] where a,b are the first two elements, [S,1] is the string of 1 or more 1's; d is the first element bigger than 1 and [R] is the remaining part of the array. Replace the array with [a,a,S',[a,b1,S,1,d,R],d1,R] where [S'] is a string of a's of equal length as string [S].
5. If none of the previous rules applies, replace the second element: [a,b,c,R] becomes [a,[a,b1,c,R],c1,R]
I am fairly well convinced that Bowers is right in stating that the value represented by the 5element array [n,n,n,n,n] is at least as large as n→n→n→...→n in the Conway chainedarrow notation, where there are n items in the chain.
For more on Bowers' notation, including updated definitions and a great many more steps in defining new recursive functions, read here: Array Notation, or check the newer, longer version here: Exploding Array Function. There are some other Bowers links below.
Generalized Invention of Recursive Functions
At this point it is best to just describe the general process. Larger numbers are described by defining various types of recursive functions, always defined in terms of themselves and other previously defined recursive functions. Each new definition adds a little more complexity to the system. In order to understand any one function, you have to understand all the functions it is defined in terms of. Once you have defined a new function, you can invoke it with larger and larger arguments: f(2), f(10), f(f(1000)), etc. until the amount of digits and notation symbols becomes inconvenient, then you define a new function g(x).
It is important to note that you keep adding information: plugging in larger numbers like 2, 10, 1000 increases the information, and defining functions greatly increases the information. In general, larger numbers require more information.
But defining functions is just an operation in itself. If you define a standardized way to define new functions, then you can abstractify the process of defining the functions, and define a new function based in the idea of iterating the process of defining functions. This requires modeling the process of recursive definition and computation, something that can be done with, say, a computer program that emulates another simpler computer.
This is a jump into a secondhigher level of abstraction. Just as arithmetic is an algorithmic abstractification of counting, and defining functions is an algorithmic abstractifcation of the mechanics of arithmetic, this new process of automatically repeatedly defining functions is an abstractification of that.
All of these ideas were formalized and the process of algorithmic abstractification was studied in the theory of computation by Turing, Church and Kleene, among others. They showed that all algorithmic processes within a certain limited definition of algorithmic process could be reproduced by a certain, minimal definition of computation, and used that model to show that there were certain limits to what types of things could be computed.
Formal Grammars
If the foregoing makes little sense, consider this concrete (but somewhat nonrigorous) example. Select any well defined, "sufficiently powerful" grammar G, consisting of a symbolset of S symbols, finite in number, and welldefined rules of what constitutes a syntactically valid string of symbols specifying an integer. An example grammar that should be fairly familiar uses the symbols:
0 1 2 3 4 5 6 7 8 9 + * ^ ( )
and the rules that these symbols are to be strung together to make a legal set of additions, multiplications and exponentiations yielding an integer result; in this example S = 15. Just to be unambiguous, we'll require parentheses whenever two operators appear in a string.
Given this grammar G, for every integer N there is a set of integers E_{N} consisting of all the integers that can be specified as a combination of N symbols in G using G's defined grammar. This set is finite, (it has, at most, S^{N} elements, since there are that many combinations of N symbols from a set of S). Since E_{N} has a finite number of elements, it therefore has a maximum element. Define m(N) to be a new function (not a part of the grammar G) giving the value of this maximum expressible integer in the grammar G for each N. Now we have a function which is guaranteed to grow at least as fast as any function defined within G, or faster. (Technically, it is only guaranteed to grow faster above a certain minimum value of N — this is part of what we vaguely called "sufficiently powerful"). In any case, this function, or any larger function definition from f(x) = m(x) + 1 to f(x) = m(m(m(x))) or beyond, can be defined as part of a new, larger grammar G' incorporating all of the definitions of G plus the new definition of f().
So, in the specific example given here, we find in particular that for N = 1, N = 3, N = 7, N = 11, the largest expressible integers in G are:
9, therefore m(1) = 9
9^9, therefore m(3) = 9 ^ 9
9^(9^9), therefore m(7) = 9^{④}3
9^(9^(9^9)), therefore m(11) = 9^{④}4
and in general for N = 4 X + 3 for any integer X > 0,
m(N) = 9^{④}(X+2)
Since N is always larger than X + 2 we can define our new grammar G' just by adding the symbol:
h
(which represents the ^{④} or hyper4 operator) and the new syntax:
a h b
where a and b are valid strings, and interpreted as a^{④}b. This function grows faster than G's m(x) function. In this new grammar, which we now call G', m(3) is 9^{④}9, m(4) is 9^{④}99 and m(7) is 9^{④}(9^{④}9), so the process can continue to grammar G'' and so on. If you continue the same idea indefinitely you just get higher hyper operators, but you could also define new symbols using the ideas given above — the Ackermann function, the Conway chainedarrow notation, etc. At each stage you have a grammar G^{x} with its maximal function m(n) to which the same idea can be applied to generate another bigger function.
The LinRado Busy Beaver Function
The process described in the previous section defines higher functions while adding to the amount of information necessary to describe the function and its formal system. To do this in absolutely the least amount of data, it's hard to beat the busy beaver problem.
The Turing machine is often used to demonstrate fundamental principles of computation. It is equivalent to many (but not all) actual realworld computer techniques. A Turing machine consists of a state machine that has a certain (finite) number of states, an infinitely large memory (usually described as an infinitely long linear strip of paper that can be marked in two different ways) and a set of rules describing what the machine will do given its current state and the marks on the paper. The rules are limited to things like moving one way or the other on the strip, writing a symbol (like 0 or 1) on the paper, looking at what's written at the current position, changing to another state, and stopping. The rules are often described as "fivetuples": each rule is five numbers (A,B,C,D,E) and is interpreted as "If you're in state A and the tape has symbol B then go to state C, write symbol D, and move the tape E units to the right". (A must be from 1 to N, C must be from 1 to N or 'H' for halt, B and D must be 0 or 1 and E must be 1 or 1. Note that a "don't move the tape" option wouldn't gain anything, because then you'll just apply another rule and overwrite the tape anyway.).
The Busy Beaver Function was originally defined by Tibor Rado at Ohio State in 1962. It is defined by specifying that you must start with a blank tape (all 0's), with a finite number of symbols per position on the tape (we usually use two: 0 and 1) and you're limited to N states in the state machine. What is the most number of marks (1's) you can have it write on the tape before stopping? A set of rules that never stops doesn't count. The maximum number of marks for N states is BB_{N}. This is a welldefined function and it grows very very fast.
In this table, the column labeled "Machines" tells how many Turing machines of N states exist; this is (4N+4)^{2N} (the number that actually have to be checked is lower). The column labeled "steps" shows how many steps are taken by the current recordholder before halting. Here are some older record setters and a more detailed description of the difficulty of the problem. A good page for recent infomation about the problem is Marxen's own page.

When it comes to implementing fastgrowing functions of integers, Turing machines appear to do a very good job of going at least as high as anything else we've defined. For example, a Turing machine with only 6 states is sufficient to implement an interated exponential function with a chaotic deterministic lowprobability exit condition. The machine that set the 1.29149×10^{865} record is essentially performing the iteration X=2^{K×X} several times in a row before halting. There are few involved with Turing machines who doubt that with only a few more states, massively higher numbers can be computed by much fastergrowing functions.
BB_{N} is not "computable" in the formal sense — you cannot predict how long it might take to count the number of 1's written by all Turing Machines with N states for arbitrary values of N. But for specific small values of N, it is possible to do a bruteforce search, with human assistance to examine all the "nonhalting" candidates and equip your program with patternmatching techniques to identify these as nonhalting.
However, this takes massively greater amounts of work for each higher value of N, and so the Busy Beaver function is unwieldy to calculate. Noone has been able to complete the bruteforce search for any value of N greater than 4.
So the Busy Beaver function is not actually a good way to calculate big numbers — for example, 10^{1027} isn't nearly as big as BB_{27}, but it's bigger than any BB_{N} value we've been able to calculate, and it can be calculated much more quickly and easily.
The only way in which the Busy Beaver function "grows fastest" is when you look at it in terms of the function's value compared to the amount of information required to specify the formal system, the function, and the function's parameter(s). This is a highly abstract concept and shouldn't be considered important unless you are studying the theory of deterministic algorithms specified within formal systems. To understand this, you can imagine, defining a precise set of rules for manipulating symbols, which define all of the functions above (starting with addition and going up through chained arrow notation, iteratively defining new functions, and so on). Each new rule, symbol and function would take a bit more information to define completely. If you wrote a computer program to compute each function, each program would be a bit more complex. You could also do the same thing by starting with a definition of the rules of the Turing machine, then start with 1state Turing machines and then increase the number of states by adding a few extra bits of information per state. It is generally believed that, as the amount of information used gets higher, the Turing machine based system will produce higher function values than any other formal system.
In other words, the Turing machine is believed to be the most concise known generalpurpose algorithmic computation system. It seems to grow faster than any other function in any other formal system when both the system's definition and the function's arguments are counted as part of the data length.
Beyond the Busy Beaver Function
Attempts to go beyond the Busy Beaver function, by necessity, have to go beyond functions, algorithms and formal systems. Imagine any sufficiently general definition of formalism (such as the Turing machine) and then define a function f(N) giving the maximum value of the results of its computation in terms of N, a suitably formalized specification of the amount of information used to define the formal system and the algorithm. f(N) will have a finite value for any finite N and can be said to grow at a certain rate. Because all f(N) are finite for all finite N, there exists a g() such that g(N) is greater than f(N) for all N.
By necessity, it is impossible to define g(N) in any specific way because the entire realm of formal systems and algorithmic definition is already a part of the definition of f(N). By necessity, g(N) cannot have a clear definition: if it did that definition is formalizable and capable of being computed by the Turing machine, and is therefore already part of f(N).
At this point in the discussion (or usually sooner) it becomes apparent that there is additional knowledge and assumptions "outside the system". An effort is made to identify these, define them precisely and add them into the quantity N. After doing this, it is soon discovered that the resulting formal system itself depends on things outside itself, and so on. I have encountered many expositions, discussion threads, etc. over the years, that begin with an optimistic determination to formalize the problem and quantify exactly how large numbers can be derived from first principles; they all have ended up somewhere in this jungle of abstraction. Here is a relevent quote:
I have this vision of hoards[sic] of shadowy numbers lurking out
there in the dark, beyond the small sphere of light cast by the
candle of reason. They are whsipering to each other; plotting who
knows what. Perhaps they don't like us very much for capturing their
smaller brethren with our minds. Or perhaps they just live uniquely
numberish lifestyles, out there beyond our ken.
— Douglas Reay^{12}
Oracle Turing Machines
One popular approach was involves the use of "oracle functions". An oracle function is a function that has a definite value but is not computable in any ordinary sense (because for example it might take an infnite number of steps to compute). The halting problem function (which returns 1 or 0 depending on whether or not a given Turing machine halts) and the Busy Beaver function as defined above are both good examples. An "oracle Turing machine" is a Turing machine that has the capability of computing some chosen oracle function (usually this is described as a normal Turing machine that is capable of sending a number to a connected "black box" which implements the oracle function and sends back a yes or no answer).
A singleoracle Turing machine has its own halting problem which is different (and in a certain way "harder") than the halting problem for normal Turing machines. It also has its own Busy Beaver function, which might grow at a faster rate (possibly depending on what oracle function is implemented). Both of these are just as difficult as the original Halting and Busy Beaver functions.
One can of course imagine Turing machines that have two or more oracle functions, or a single oracle function that answers questions about another type of oracle machine. If a "first order oracle machine" is a Turing machine with an oracle that computes the Busy Beaver function for normal Turing machines, then a "second order oracle machine" has an oracle that computes the Busy Beaver function for first order oracle machines, and so on.
(However an oracle function cannot answer questions about a system that incorporates itself, for the same reason that the original halting function is uncomputable. To see why, consider an oracle machine that asks its oracle "will I halt?" and then halts only if the answer is no.)
Nabutovsky and Weinberger have shown[47] that group theory can be used to define functions that grow as quickly as the Busy Beaver function of a secondorder oracle Turing machine.
The Frontier
As of this writing (last checked in 2007), this seems to be the frontier of development for the expression of large finite numbers. Of course, many people try, but everything seen so far appears to duplicate or fall short of the results of computation theory.
If you're interested in defining larger functions, go right ahead, but please note that you'll want to check your new function to see if it really pushes the limits a significant amount. If you use any of the methods described above, then your new function will not push the limits a significant amount.
Transfinite and Infinite Numbers
Beyond all the finite numbers are transfinite numbers and infinities. Once we go beyond finite numbers, we enter an area where it is essential to define exactly what theory of numbers we're working in.
Most number theory follows the axiomatic method, a discipline established by Euclid in the study of geometry and later adapted to every other branch of mathematics. By the axiomatic method, results are found by starting with a set of axioms and strictly following a set of rules to derive new results. This technique seemed flawless until the development of nonEuclidean geometry in the 19th century, which showed that one could construct equally valid, useful, and consistent versions of a given type of mathematics (e.g. geometry) by starting with a different set of axioms. Mathematicians were even more surprised in the 1920's when Gödel showed that no (sufficiently powerful) axiomatic system of number theory can prove all statements which are true in that system. It is now agreed that this phenomenon of incompleteness is a property of all axiomatic systems.
Depending on what type of number theory you're looking at, there may or may not be transfinite numbers and there may or may not be a plurality of infinities. These differences result from the use of different axioms and rules for deriving results. Different axioms and rules lead to different results including different answers to the question what lies beyond all the integers?. Because different systems are useful for different things and none can generate all useful results (due to incompleteness as demonstrated by Gödel) we end up with several different 'right answers' to the question. None is the 'best' answer, but some are more popular than others. (The term transfinite itself is a result of this — it was Cantor's effort to avoid using the term infinite for certain quantities that were definitely not finite, but did not share all the properties of what he considered truly infinite, and now called "Absolute Infinite".)
In the discussion to follow, it is often difficult or even meaningless to compare the various definitions of infinities to each other, trying to determine which is larger. However, within any one number theory system the infinities can usually be put into a clear order.
Georg Cantor developed two different systems of infinities, called ordinal and cardinal, out of his work in set theory during the 1870's and 1880's. His work still suffices for most purposes (much as Newton's physics is sufficient for most engineers).
First page . . . Back to page 4 . . . Forward to page 6 . . . Last page (page 7)
If you like this you might also enjoy my numbers page.
s.11