The Lin-Rado/Goucher/Rayo/Wojowu Method
The process described in the previous section defines higher functions while adding to the amount of information necessary to describe the function and its definitions. All of these functions are said to be computable. This is a very jargony word with a very specific precise meaning, going back to the early 20th century, that makes sense within the same textbooks that might outline the Church-Turing thesis or Gödel's incompleteness theorems.
If however, we are willing to step outside the capabilities of the computer executing one specific well-defined algorithm on a finite number of finite-valued integer inputs, we can use far fewer symbols to define a faster-growing function. We do this by using a formulation with this structure:
B(n) is the largest number computable in a way that can be completely defined in n or fewer symbols, drawn from a finite alphabet and using a finite set of combination / operation rules, all of which are agreed upon by both parties before any attempt is made to contemplate the value of B(n) for any specific argument n.
In this formulation, we aren't allowing new symbols to be added for hyper4, the Ackermann function, etc. but we don't need to if we choose a system that is "computationally universal", i.e. capable of computing any of the foregoing functions/operations, or others similarly defined, if given a suitable program. Such universal computing capacity is called Turing completeness or "Turing equivalence". This is probably because Turing's treatment of it could be assessed in this way, and it was presented at the right time; but the first formal systems and computer designs with universal capability were much older.
The Lin-Rado Busy Beaver Function
Perhaps the first and best-known such formulation is the busy beaver problem. It achieves truly staggering huge numbers with absolutely the least amount of symbols, rules, and input data of anything weve seen so far; simply put, it's hard to beat. We'll see why soon, when we get to the next table with large numbers.
The Turing machine is often used to demonstrate fundamental principles of computation. It is equivalent to many (but not all) actual real-world computer techniques. A Turing machine consists of a state machine that has a certain (finite) number of states, an infinitely large memory (usually described as an infinitely long linear strip of paper that can be marked in two different ways) and a set of rules describing what the machine will do given its current state and the marks on the paper. The rules are limited to things like moving one way or the other on the strip, writing a symbol (like 0 or 1) on the paper, looking at what's written at the current position, changing to another state, and stopping. The rules are often described as "five-tuples": each rule is five numbers (A,B,C,D,E) and is interpreted as "If you're in state A and the tape has symbol B then go to state C, write symbol D, and move the tape E units to the right". (A must be from 1 to N, C must be from 1 to N or 'H' for halt, B and D must be 0 or 1 and E must be -1 or 1. Note that a "don't move the tape" option wouldn't gain anything, because then you'll just apply another rule and overwrite the tape anyway.).
The Busy Beaver Function was originally defined by Tibor Rado at Ohio State in 1962. It is defined by specifying that you must start with a blank tape (all 0's), with a finite number of symbols per position on the tape (we usually use two: 0 and 1) and you're limited to N states in the state machine. What is the most number of marks (1's) you can have it write on the tape before stopping? A set of rules that never stops doesn't count. The maximum number of marks for N states is BB(N) or BBN. This is a well-defined function and it grows very very fast.
In this table, the column labeled "Machines" tells how many Turing machines of N states exist; this is (4N+4)2N (the number that actually have to be checked is lower). The column labeled "steps" shows how many steps are taken by the current record-holder before halting. Here are some older record setters and a more detailed description of the difficulty of the problem. A good page for recent infomation about the problem is Marxen's own page.
When it comes to implementing fast-growing functions of integers, Turing machines appear to do a very good job of going at least as high as anything else we've defined. For example, a Turing machine with only 6 states is sufficient to implement an interated exponential function with a chaotic deterministic low-probability exit condition. The machine that set the 1.29149×10865 record is essentially performing the iteration X=2K×X several times in a row before halting. There are few involved with Turing machines who doubt that with only a few more states, massively higher numbers can be computed by much faster-growing functions.
When a function like "B(n)" above, or this BBN specifically, is allowed as part of the definition of a finite number, we have moved up to a higher order of "computability" than the early-20th-century type of Church, Turing and Gödel.
BBN is not "computable" in the formal sense — you cannot predict how long it might take to count the number of 1's written by all Turing Machines with N states for arbitrary values of N. But for specific small values of N, it is possible to do a brute-force search, with human assistance to examine all the "non-halting" candidates and equip your program with pattern-matching techniques to identify these as non-halting.
However, this takes massively greater amounts of work for each higher value of N, and so the Busy Beaver function is unwieldy to calculate. No-one has been able to complete the brute-force search for any value of N greater than 4.
So the Busy Beaver function is not actually a good way to calculate big numbers — for example, 101027 isn't nearly as big as BB27, but it's bigger than any BBN value we've been able to calculate, and it can be calculated much more quickly and easily.
The only way in which the Busy Beaver function "grows fastest" is when you look at it in terms of the function's value compared to the amount of information required to specify the formal system, the function, and the function's parameter(s). This is a highly abstract concept and shouldn't be considered important unless you are studying the theory of deterministic algorithms specified within formal systems. To understand this, you can imagine, defining a precise set of rules for manipulating symbols, which define all of the functions above (starting with addition and going up through chained arrow notation, iteratively defining new functions, and so on). Each new rule, symbol and function would take a bit more information to define completely. If you wrote a computer program to compute each function, each program would be a bit more complex. You could also do the same thing by starting with a definition of the rules of the Turing machine, then start with 1-state Turing machines and then increase the number of states by adding a few extra bits of information per state. It is generally believed that, as the amount of information used gets higher, the Turing machine based system will produce higher function values than any other formal system.
In other words, the Turing machine is a very concise general-purpose algorithmic computation system. It seems to grow faster than any other function in any other formal system when both the system's definition and the function's arguments are counted as part of the data length.
Beyond the Busy Beaver Function
The Busy Beaver function, and anything we'll discuss after it, by necessity must go beyond functions, algorithms and computability. Imagine any sufficiently general definition of formalism (such as the Turing machine) and then define a function f(N) giving the maximum value of the results of its computation in terms of N, a suitably formalised specification of the amount of information used to define the formal system and the algorithm. f(N) will have a finite value for any finite N and can be said to grow at a certain rate. Because all f(N) are finite for all finite N, there exists a g() such that g(N) is greater than f(N) for all N.
By necessity, it is impossible to define g(N) in any specific way because the entire realm of formal systems and algorithmic definition is already a part of the definition of f(N). By necessity, g(N) cannot have a clear definition: if it did that definition is formalizable and capable of being computed by the Turing machine, and is therefore already part of f(N).
At this point in the discussion (or usually sooner) it becomes apparent that there is additional knowledge and assumptions "outside the system". An effort is made to identify these, define them precisely and add them into the quantity N. After doing this, it is soon discovered that the resulting formal system itself depends on things outside itself, and so on. I have encountered many expositions, discussion threads, etc. over the years, that begin with an optimistic determination to formalise the problem and quantify exactly how large numbers can be derived from first principles; they all have ended up somewhere in this jungle of abstraction. Here is a relevent quote:
I have this vision of hoards[sic] of shadowy numbers lurking out
there in the dark, beyond the small sphere of light cast by the
candle of reason. They are whsipering to each other; plotting who
knows what. Perhaps they don't like us very much for capturing their
smaller brethren with our minds. Or perhaps they just live uniquely
numberish lifestyles, out there beyond our ken.
— Douglas Reay12
Oracle Turing Machines
One popular approach involves the use of "oracle functions". An oracle function is a function that has a definite value but is not computable in any ordinary sense (because for example it might take an infnite number of steps to compute). The halting problem function (which returns 1 or 0 depending on whether or not a given Turing machine halts) and the Busy Beaver function as defined above are both good examples. An "oracle Turing machine" is a Turing machine that has the capability of computing some chosen oracle function (usually this is described as a normal Turing machine that is capable of sending a number to a connected "black box" which implements the oracle function and sends back a yes or no answer).
A single-oracle Turing machine has its own halting problem which is different (and in a certain way "harder") than the halting problem for normal Turing machines. It also has its own Busy Beaver function, which might grow at a faster rate (possibly depending on what oracle function is implemented). Both of these are just as difficult as the original Halting and Busy Beaver functions.
One can of course imagine Turing machines that have two or more oracle functions, or a single oracle function that answers questions about another type of oracle machine. If a "first order oracle machine" is a Turing machine with an oracle that computes the Busy Beaver function for normal Turing machines, then a "second order oracle machine" has an oracle that computes the Busy Beaver function for first order oracle machines, and so on.
(However an oracle function cannot answer questions about a system that incorporates itself, for the same reason that the original halting function is uncomputable. To see why, consider an oracle machine that asks its oracle "will I halt?" and then halts only if the answer is no.)
Nabutovsky and Weinberger have shown that group theory can be used to define functions that grow as quickly as the Busy Beaver function of a second-order oracle Turing machine.
Declarative Computation and Combinatory Logic
So far we have approached our large number algorithms mostly by "running a set of instructions one at a time" techniques. These are part of the imperative method of computer programming and the central technique of Turing's work. The Busy Beaver function is the first step into "second-order imperative programming".
There's a second popular way to formulate systems that express large numbers, which is closely related to the declarative of "functional" method of computer programming. This includes the lambda calculus, or its minimal form combinatory logic.
Turing's work was closely related to, and produced largely similar results to, that of Alonzo Church. The Church encoding can be used to represent data and computation operations in the lambda calculus, and computation occurs by beta-reducing assertions into results.
The lambda calculus is more powerful because (among other reasons) it allows results to be expressed without the need to figure out how those results might actually be accomplished. For this reason, in practical computing problems it is more powerful; that is why so much good work has been able to be done in LISP and Haskell, for example.
As I'll mention later ("by whichever specific axioms and means..."), there are mutiple approaches to this type of work. We'll eventually get to Peano arithmetic, set-theory and first-order formal logic. These approaches might be avoided for various reasons, including Gödelian incompleteness or because they simply aren't needed for constructing the desired result.
In the case of large numbers like those we've managed so far, we need only a three-symbol combinatory logic calculus combined with a simple (first-order) oracle for its own version of the Collatz conjecture's halting problem. This three-symbol calculus uses symbols I (identity), K (constant) and S (application) on parenthesised expressions that are equivalent to binary trees, i.e. every pair of parentheses contains two entities which may either be a single symbol or a pair of parentheses that itself contains two entities. The SKI combinator calculus or "SKI calculus" is equivalent to the more commonly-known lambda calculus of Alonzo Church.
Any study of lambda calculus defines symbols for needed terms, operations, functions, etc. as it goes along (see the Wikipedia lambda calculus for examples). The SKI calculus might seem simpler in that we're just sticking to these symbols and the parentheses, but it is equally powerful. In particular, S, K, and I are just three of the commonly-abbreviated combinators of the lambda calculus:
I := λx.x
K := λx.λy.x
S := λx.λy.λz.x z (y z)
The SKI calculus is close to being the simplest that is needed to provide all of the power of the lambda calculus (in fact, only S and K are needed, because ((SK)K) is equivalent to I). Anything in the SKI calculus can be converted to an equivalent form in lambda calculus, and vice-versa26. Therefore (by the Church-Turing thesis), the SKI calculus itself is just as powerful, from a theory of computation viewpoint, as Turing machines: for every Turing machine, there is a parenthesised expression that is valid in the SKI calculus and, when beta-reduced produces a result that is analogous to the final state (including tape) of the Turing machine.
Ix => x
Kxy => x
Sxyz => xz(yz)
As you might imagine, then, SKI calculus has its versions of the "halting problem" and the "busy beaver function".
SKI's version of the halting problem is determining if beta-reduction ever stops:
h(S) := true iff beta-reduction of S terminates
Alternatively, (for the oracle that we'll get to soon) the halting function tells whether beta-reduction produces just a single symbol "I":
O(S) := true iff beta-reduction of S produces "I"
Examples of different types of beta-reduction results include:
For the "busy beaver function" of SKI calculus, we define the length of a string in the obvious way:
l(S) := length of the string S (number of symbols not including parentheses)
and the "output" of a string is the length of the result of beta-reducing it:
o(S) := the length of the final string after beta-reduction, or undefined if h(S) is false
then the "busy beaver function" of SKI calculus is:
bb(n) := the largest value o(S) of any string S for which l(S)=n and h(S) is true
Adam Goucher's Ξ(n)
Some time after Rayo's Number came along (we'll get to it later) Adam Goucher  was attempting to define a large number in a way like that of the Lin-Rado Busy Beaver function. He recognised that this combinatory logic system was only equal to Turing machines, and that its bb(n) would grow comparably to the Lin-Rado BB(n). To make his definition bigger, he added the oracle symbol O, which indeed makes for a faster-growing bb(n) function. His description defines what is essentially first-order arithmetic, i.e. a system like that coming out of the Peano axioms above, along with a rich set of predefined things like operators + and ×, an infinite alphabet of variables, etc. As such, it looks a lot like Hofstadter's Typographical Number Theory . Its computational capabilities are equivalent to normal Turing machines and the SKI calculus.
Adding the oracle operator O brings the system up to the level of second-order arithmetic. Goucher then defined the Xi function Ξ(n):
Ξ(n) = the largest value o(S) of any string S for which l(S)=n and h(S) is true
This is the same as bb(n) above, but with S allowed to include O symbols in addition to S, K, and I.
Finally, Goucher's number is defined as Ξ(106).
In this construction, Goucher confused matters a bit by referring to the first-order arithmetic as "First-order logic", and then asserts that Rayo's number is defined in terms of that "first-order logic"; but as we'll see later, Rayo's number is more like a busy-beaver function for first-order set theory, which is more powerful than nth order arithmetic.
For more on Goucher's Xi function, see its Googology page: Xi function.
Japanese readers should see: 巨大数論 (from @kyodaisuu on Twitter)
If you like this you might also enjoy my numbers page.