Large Numbers  


First page . . . Back to page 2 . . . Forward to page 4 . . . Last page (page 9)


Inventing New Operators and Functions

The concept of the "classes" described so far does quite well at handling everything that can be done with exponents, which are the most powerful operator known to most people. To proceed further we begin to invent new operators. This practice of inventing new operators continues over and over again as you go to higher and higher large numbers. The new operators overcome the limits of the old operators, limits that are reached as the old notation becomes unwieldy.

For example, class-1 numbers are written in traditional place-value notation, which is essentially abbreviated addition and multiplication. For example:

3158 = ((3 × 10 + 1) × 10 + 5) × 10 + 8

Although we don't normally think of it that way, the place-value notation avoids the unwieldy use of lots of symbols.

When expressing larger numbers, like Avogadro's number and googol, one usually uses exponents and power towers, as discussed above:

6.02 × 1023, 10100, 1010100, 27256312546656, etc.

but after a while that becomes unwieldy too. Eventually there are so many exponents that it cannot be written on a page. Then it becomes a good idea to invent a new new shorthand, which amounts to defining a new operator.

Function Hierarchies

Before going on to actual invented operators and functions, it is useful to consider how functions can be meaningfully put into an order.

In some branches of mathematics and computer science it is common to express the "asymptotic growth rate" in terms of one of the "Bachman-Landau notations" such as Big O notation. Usually these are used to put functions into growth categories, such as linear O(x), quadratic O(x2), exponential O(2x), self-exponential O(xx), and so on. Each category includes functions that grow fast enough that they "eventually" exceed any function in the lower categories.

The concept of "eventually dominates" is often used, particularly in the types of function hierarchies that are of interest to us here. By the most common usage, eventual dominace is defined like this:

g() eventually dominates f() if and only if there is some m such that, for any x>m, g(x) > f(x).

By this definition, g(x) = 2x is said to eventually dominate f(x) = x+10, because for large enough arguments (in this case for x>10) g(x) is larger than f(x).

However, the above definition also means that g(x) = x+1 is said to eventually dominate f(x) = x. For many purposes this is a meaningless distinction, and various solutions are used. Sbiis Saiban defines a variant called "dominance over translations" which is similar to the above, except that there is the further requirement that you if add a constant k to the argument of f(), the dominace will happen no matter the value of k. This ensures that x+1 is not considered to dominate over x.

Let's stick with the common definition of eventual dominance, but for now just ignore any variants that differ only by adding a constant, and let's put some functions into an ordering.

The first set of functions can be linear multiples of x:

f1 f2 f3 f4 f5 . . . fk
x 2 x 3 x 4 x 5 x . . . k x

Each of these grows more quickly than the one before it. Note that I have given each function a name by using "f" with a subscript. There is a function for each positive integer k.

Another set of functions might be the powers of constants constant powers (commonly called "exponential growth rates" including the powers of 2, powers of 3, etc.):

g1 g2 g3 g4 g5 . . . gk
1x 2x 3x 4x 5x . . . kx

We'll define more later, but let's try to see how all functions can be put into a single ordering. Immediately we confront an issue.

Why Function Hierarchies Require a Transfinite Ordinal Index

As you can see, in the previous section we defined two sets of functions, fk() and gk() for positive integer k. To prove useful results about large numbers we want to have all functions be in a single group, say fn() for some element n.

But there are two problems:

The first problem is easily addressed, if we simply leave out g1(x) = 1x = 1 and instead start with g2().

However the second problem is a bit harder.

We could simply limit the first group to a finite size, say 100 linear functions going from f1(x) = x to f100(x) = 100x. Then add the g() functions with an index starting at 101, or maybe 102 since we're also dropping g1().

However mathematicians never like to do that. Instead, they solve the problem using transfinite ordinals. This is a topic that gets covered far later in this article, but for now we'll just use the Greek letter "omega" ω to represent "infinity", and use the rule that we are allowed to use numbers like ω+1 to mean "infinity plus one".

By using transfinite ordinals, the previous two groups of functions can be combined into one:

The Munafo Function Hierarchy (first attempt)

f1 f2 f3 f4 f5 . . . fk
x 2 x 3 x 4 x 5 x . . . k x
f

w

+1

f

w

+2

f

w

+3

f

w

+4

f

w

+5

. . . f

w

+k

2x 3x 4x 5x . . . kx

Notice we also dealt with the g1(x) = 1 problem by not using the index

w

+1.

Why There are Competing Function Hierarchies

Using Cantor ordinals for the index allows us to get quite far in our analysis of definitions of large numbers. However, there is another problem, which recurs a lot in this sort of work: the example above skips some functions.

Suppose we want to add the constant powers (commonly called "polynomial growth rates" starting with quadratic, cubic, etc.):

h1 h2 h3 h4 h5 . . . hk
x x2 x3 x4 x5 . . . xk

Most of these fit between the two rows of functions in the earlier table. As before there is a function that does not, this time it is h1(x) = x. As before we can leave it out. The rest clearly grow faster than the linear functions, and the first of the exponential functions 2x grows faster than all of these.

However, if we want to add this row to our table of functions named by fn() with a subscript n, we cannot simply insert this row into the table because there are no avaiable Cantor ordinals between the integer-indexed ones fk() and the ones with ω in the index f

w

+k(). In order to incorporate these polynomial functions, we would need to do something with the indices. Here is one possible solution:

The Munafo Function Hierarchy (second attempt)

f1 f2 f3 f4 f5 . . . fk
x 2 x 3 x 4 x 5 x . . . k x
f

w

+1

f

w

+2

f

w

+3

f

w

+4

f

w

+5

. . . f

w

+k

x2 x3 x4 x5 . . . xk
f2

w

+1

f2

w

+2

f2

w

+3

f2

w

+4

f2

w

+5

. . . f2

w

+k

2x 3x 4x 5x . . . kx

Here we have renumbered the exponential functions starting at 2ω. This makes enough room to insert the infinite class of polynomial functions, for which we re-use the indices that start with ω.

But, as you can probably guess, this problem of incompleteness has not gone away. More rows of functions can be inserted:

hk(x) = k ln(x)

("k x" row is here)

hk(x) = k x + ln(x)

hk(x) = k x ln(x)

("xk" row is here)

hk(x) = xk + x

hk(x) = k x xk = k xk+1

("kx" row is here)

hk(x) = k2x = (kx)2

It is also possible to insert entire infinite classes of functions between two members of the same row. This one shows what you probably already realised, that there are infinite sets of polynomial functions between the polynomials we already considered:

hk(x) = x3 + k x + 7
    (goes between x3 and x4)

There is also (or course) entire hierarchies of polynomials between each member of the exponential functions:

hk(x) = 2x + k x3 + 2 x
    (goes between 2x and 3x)

These functions might not be as commonly used or as interesting, but they illustrate a principle that applies to hierarchies of functions in general: In order for a hierarchy to be useful, you have to make some precise definitions regarding what functions will be included, and what indices each function will get; and (perhaps less satisfying) you have to accept the fact that any hierarchy will leave out a lot.

This last fact, that there will always be functions left out, leads to the invariant fact that in some cases, it will be difficult to definitively say which of two large number definitions is larger. Often, the best we can do is to say something like "They both lie between fω+1(63) and fω+1(64)".

Beyond Exponents: the hyper4 Operator

(most commonly called "tetration")

(my early name for this was "powerlog")

The first new operators used by those seeking large numbers are usually higher dyadic operators. A dyadic operator is one that has two arguments — two numbers that it acts on. Usually in notation the operator is placed between the two numbers.

The most common higher dyadic operators follow the pattern set by the well-known three (addition, multiplication and exponentiation). These operators come up a lot in the definitions of large numbers that are to follow.

Following an obvious pattern in the three common operators, the new operator can be defined as shown here:

operationrepresentation absolute definition inductive definition
addition a + b  or  ab - successor (a + (b-1)) or successor ((a-1) + b)
multiplication a×b  or  a*b  or  a.b  or  ab a + a + ... + a a+(a×(b-1)) or (a×(b-1))+a
exponentiation abor  a^b  or  a↑b  or  ab a × a × ... × a a×(a(b-1)) or (a(b-1))×a
hyper4 a^^b  or  a↑↑b  or  ab a^(a^(...^a)) a^(a(b-1))
hyper4 ab ((a^a)^...)^a (a(b-1))^a

Note that for the last operator, there are two ways to interpret the absolute and inductive definitions, producing different hyper4 operators. In common practice, the first one is used because the other one can be reduced to a combination of two exponent operators: ab=aa(b-1), and thus it does not really count as a new operator.

The names tetration, superpower, and superdegree have also been used to refer to the hyper4 operator. (As a child I used the somewhat misleading name powerlog for hyper4, as in 2 powerlog 5 is 65536.)

Extension to reals : Now, suppose you want to calculate 22.5 or pie. The above definition isn't too useful because the number after the has a fractional part. What we would need is a way to "extend" the hyper4 operator to real numbers. Unfortunately, this is tough to do in a way that meets the types of standards mathematicians generally want such things to have. I also know of no proof that such extension is impossible. A lot of people have worked on this over the years, and if you're interested, I suggest you check my notes here, and the Tetration FAQ.

A "logarithm" for hyper4 : Another common question about hyper4 is how to perform the inverse operations — the equivalent of the "hyper4 logarithm" and "hyper4 root". There is no good answer for either one, until the problem of extending hyper4 to the reals is solved. The "hyper4 root" can be evaluated for fixed integer "root" values using Newton's method. For example, to take the "2nd hyper4 root", use this algorithm:

(given: number X, we want to find R such that R2 = X. Note that R2 = RR.)

step    action notes
1. R = ln(X) first approximation of answer
2. Y = RR calculate the function
3. dY = Y (1 + ln R) derivative with respect to R
4. new R = old R + (X-Y)/dY new approximation by Newton's method
5. go back to step 2 repeat until accurate enough

The hyperlogarithm is intuitively similar to the "class number" (see my description of classes above) along with a fraction indicating how far through the class we are. It is very similar to the level-index representation and to the internal format used by my hypercalc program. Here are some hyperlogarithm values (to base 10) using a definition from Trappman's Tetration FAQ15:

hyperlog(2) ≈ 0.39
hyperlog(100) ≈ 1.39
hyperlog(10100) ≈ 2.39
hyperlog(1010100) ≈ 3.39
...

Another function conceptually similar to this is the inverse Ackermann function:

α(*x) = greatest n such that a1(n) < x

where a1(x) is as defined in the Ackermann section below. This inverse function grows more slowly, and is not an inverse of hyper4, but rather for the generalised hyper function.

The function "below" addition : Some people have also developed a hyper0 function. If you think about it, addition is a shortcut for counting, in much the same way multiplication is shortcut for addition. The following definition for a hyper0 function was developed by Constantin Rubtsov:

ab = a      (if b = -∞)
ab = b      (if a = -∞)
ab = a+2 = b+2      (if a = b)
ab = a+1      (if a > b)
ab = b+1      (if b > a)

This function, appropriately enough, is also the "successor" function used as the primitive computational element in algorithms defined in the Church theory of computation, which includes the original Ackermann function. For more on how this is done see my page on functional computation.

The Hyperfactorial and Superfactorial Operators

These are single-argument functions like the factorial but producing higher values.

N.J.A. Sloane and Simon Plouffe use hyperfactorial to refer to the integer values of the K-function, a function related to the Riemann Zeta function, the Gamma function, and others. It is

H(n) = nn (n-1)n-1 ... 33 22 11

For example, H(3) = 27×4×1 = 108 and H(5) = 86400000. This function does not really grow much faster than the normal factorial function.

In 1995, Pickover defined the superfactorial n$ (think of the dollar sign as a factorial sign with an S for "super" drawn on top of it) as follows:

n$ = n!n!n!....n!

where there are n! repetitions of n! on the right hand side. Using the hyper4 operator, n$ is equivalent to:

n$ = n! n!

There are other ways to define a higher version of the factorial, such as this and this.

To get an idea how big the hyperfactorial of a pretty normal number can be, read Wayne Baisley's wonderful article "Quantity Has A Quality All Its Own" (and bring your towel).

More Bowers Names

Jonathan Bowers, mentioned above, has many names covering this area. For example, in analogy to googol and googolplex he refers to 10100 as giggol and 10(10100) as giggolplex.

Higher hyper operators

Of course, the pattern of dyadic operators is easily continued:

operationrepresentation absolute definition inductive definition
hyper5 a^^^b  or  a↑↑↑b  or  ab a(a( ... a)) a(a(b-1))
hyper6 a^^^^b  or a↑↑↑↑b  or  ab a(a( ... a)) a(a(b-1))

and so on.

Bowers has several named numbers in this area, including trisept, 77; tridecal, 1010; and the aptly named boogol, the frighteningly large 10(100)10.

The First Triadic Operator: The Generalised "Hyper" Function

Since the dyadic operators all fall into a pattern, it is logical to define a triadic operator that combines them all. A triadic operator is a function that acts on three numbers, just as a dyadic operator acts on two numbers.

This new triadic operator is represented as a function with three arguments, and defined as follows:

hy(a,n,b) = { 1 + b for n = 0 { { a + b for n = 1 { { a * b for n = 2 { { a ^ b for n = 3 { { a ^ hy(a,4,b-1) for n = 4 { { hy(a,n-1,hy(a,n,b-1)) for n > 4 { { a for n > 1, b = 1

the following definition is equivalent:

hy(a,n,b) = { 1 + b for n = 0 { { a for n = 1, b = 0 { { a for n > 1, b = 1 { { hy(a,n-1,hy(a,n,b-1)) for n > 0

and also note that:

hy(a,3,b) = a↑b = ab
hy(a,4,b) = a↑↑b
hy(a,5,b) = a↑↑↑b
hy(a,6,b) = a↑↑↑↑b
etc.

Generalising the Hyper Operator for Non-Integer n

Previously we looked at efforts to extend tetration to non-integer values, which in terms of this new "hy(,,)" function means computing hy(a,4,b) for non-integer values of b. Since there is now a third parameter to the function (the middle parameter n) it is natural to consider non-integer values of that too. For example, hy(a,2.5,b) would be a function "between" multiplication and exponentiation.

Gottfried Helms describes an approach to this interesting problem on Maths StackExchange here but not explaining how to compute it. (He does refer vaguely to a "Schröder-mechanism" and refers the reader to the tetration forum, without a specific link, and was possibly thinking of this discussion which also relates the problem to the Hyperoperation#Commutative hyperoperations of Albert Bennett.)

Bowers' Array Notation (3-element Subset)

At this point we return to the work of Jonathan Bowers to introduce his array notation. This notation is elegant, powerful, relatively easy to use and covers a greater range than any other discussed on these pages, within the limits of functional formal systems.

We will start by showing a very reduced version of the notation, which uses arrays of only 1, 2, or 3 elements. The rules for converting the notation into a number are:

1. For one- and two-element arrays, just add the elements. [a] = a and [a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's, remove them: [a,b,1] = [a,b] = a+b; [a,1,1] = [a].
3. If neither previous rule applies, and the 2nd entry is a 1, remove all but the first element: [a,1,n] = [a] = a.
4. There is no rule 4 (there will be when we get to bigger arrays).
5. Otherwise replace the array [a,b,n] with [a,[a,b-1,n],n-1], then go back and repeat the rules to expand it further.

With just a little effort you can see that these rules make [a,b,n] equivalent to hy(a,n,b) except for the special case of n=0. Compare the formula of rule 5:

[a,b,n] = [a,[a,b-1,n],n-1]

with the general case of the definition of the hyper function:

hy(a,n,b) = hy(a,n-1,hy(a,n,b-1))

They are the same except the order of the arguments is different. Bowers arranges the arguments in order of increasing "growth potential" — the operator has higher growth potential than b, so it goes last.

So, all 3-element Bowers arrays are equivalent to the normal hyper operators. [3,2,2] = 32 = 3×2 = 6; [3,2,3] = 32 = 32, [4,5,6] = 45, etc.

hyper operator variant: Knuth's Up-arrow Notation

The use of two or more carets (as in "a^^b" or "a^^^b") resembles a notation defined by Donald Knuth17 in 1976 ("a↑↑b" and "a↑↑↑b" respectively), and is equivalent to the hyper operator. Carets are commonly seen in old ASCII sources such as mailing lists from the early days of USENET, but Knuth used real arrows: a↑↑b and a↑↑↑b instead of a^^b or a^^^b.

a ↑↑ b = hy(a,4,b)
a ↑↑↑ b = hy(a,5,b)
a ↑↑↑↑ b = hy(a,6,b)
(etc.)

using the hy() function allows for a more compact representation of really large numbers that would otherwise take a lot of arrows. For example, hy(10,20,256) is equivalent to

10 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ 256

and hy(256,625,4096) would be very unwieldy. Bigger numbers like hy(256,4294967296,256) can't be written at all.

This up-arrow notation is used in defining the Ackermann numbers

A(n) = n ↑↑↑...↑↑↑ n (with n up-arrows) = hy(n, n+2, n)

which are related to the Ackermann function described below.

In 2010 Knuth informed me [51] that he has found "the Ackermann-like 'arrow notation' in a 19th century encyclopedia."

Partial Ordering for Knuth Up-Arrows

One may speculate on the general problem of determining which is the larger of two values a↑↑↑...↑↑↑b and x↑↑↑...↑↑↑y. We can begin to make answer that question for small numbers of up-arrows. In particular (for later discussion) we care about the answer when a, b, x and y are positive integers.

First, note that if a is 1, a↑↑↑...↑↑↑b is just a power of 1, which is always 1. Also, if a and b are 2 then the value of a↑↑↑...↑↑↑b is 4, regardless of the number of arrows.

With a single arrow ab is the familar exponentiation operation.

23 is smaller than 32
24 is the same as 42
for any other a and b, if b is greater than a then ab is greater than ba.
in general, to compare ab to cd we can probably calculate both directly, so long as all four numbers are class 1.

With two arrows, a↑↑b is a "power-tower" of height b. Using Hypercalc it is relatively easy to compile a list of a↑↑b for all the smaller values of a and b, and larger values of a. Here I'll also show the smaller values of cd that are not expressible as in the form a↑↑b, to see where they fit in:

1↑↑a = 1, for all a
2↑↑1 = 1
3↑↑1 = 3
2↑↑2 = 22 = 4 = 4↑↑1
5↑↑1 = 5
6↑↑1, 7↑↑1, etc. (a↑↑1 = a, for all a)
23
32
2↑↑3 = 2^(22) = 24 = 42 = 16
52
3↑↑2 = 33 = 27
25
62, 72
26 = 82
34 = 92
102, 112
27
122 through 152 and 35
4↑↑2 = 44 = 28 = 256
172, etc.; 28, etc.; 36, etc.
5↑↑2 = 55 = 3125
562, etc.; 153, etc.; 212, etc.; 38, etc.
6↑↑2 = 66 = 363 = 2162 = 46656
2172, etc.
2↑↑4 = 216 = 48 = 164 = 2562 = 65536
7↑↑2 = 77 = 823543
.. 8↑↑2, 9↑↑2, through 11↑↑2
3↑↑3 = 3↑(33) = 327 = 7625597484987
12↑↑2 = 1212 = 8916100448256
.. 13↑↑2 through 80↑↑2
4↑↑3 = 4↑(44) = 4256 ~ 1.3408×10154
81↑↑2 = 8181 ~ 3.8662×10154
.. 82↑↑2 through 758↑↑2
5↑↑3 = 5↑(55) = 53125 ~ 1.911×102184
759↑↑2 = 759759 ~ 1.269×102186
.. 760↑↑2, etc.
2↑↑5 = 2↑(2↑↑4) = 265536 ~ 2.004×1019728
5298↑↑2 = 52985298 ~ 2.214×1019730
.. 5299↑↑2, etc.
6↑↑3 = 6↑(66) = 646656 ~ 2.659×1036305
7↑↑3 = 7↑(77) ~ 3.76×10695974
.. 8↑↑3 through 11↑↑3
3↑↑4 = 3↑(3↑↑3) ~ 1.35×103638334640024
12↑↑3 = 12↑(12↑↑2) ~ 5.85×109622088391634
.. 13↑↑3 through 80↑↑3
4↑↑4 = 4↑(4↑↑3) ~ 108.0723×10153
.. 81↑↑3 through 758↑↑3
5↑↑4 = 5↑(5↑↑3) ~ 101.336×102184
.. 759↑↑3, etc.
2↑↑6 = 2↑(2↑↑5) ~ 106.03×1019727
6↑↑4 = 6↑(6↑↑3) ~ 102.07×1036305
.. 7↑↑4 through 11↑↑4
3↑↑5 = 3↑(3↑↑4) ~ 106.46×103638334640023
...

A pattern emerges: except when a is 2 or when b is 2, the values of a↑↑b generally follow the rule:

If y is larger than b, x↑↑y will be larger than a↑↑b.

However there are exceptions for smaller b or moderately larger a: as 12↑↑2 is larger than 3↑↑3; 81↑↑2 is larger than 4↑↑3, and similar things happen further along in the list.

But even including these smaller b or larger a cases, a more general pattern is seen, namely that increasing b by one always gives a value that is about 10 to the power of whatever we had before: 4↑↑3 ~ 1.3408×10154, and 4↑↑4 ~ 108.0723×10153. This is related to the "power tower paradox".

It is also generally true that if b is 3 or more, all of the numbers of the form a↑↑b are larger than anything of the form cd (with one arrow, and with "reasonably-sized" c and d). The smallest cd bigger than 3↑↑3 is 12↑12; in order for cd to be bigger than 4↑↑3 you need to go up to 81↑81, and so on.

Now let's make a similar list of a↑↑↑b examples, and showing how the a↑↑b values fit in:

2↑↑↑2 = 2↑↑2 = 2↑2 = 4
2↑↑3, 3↑↑2 through 6↑↑2
2↑↑↑3 = 2↑↑2↑↑2 = 2↑↑4 = 2↑2↑2↑2 = 2↑2↑4 = 2↑16 = 65536 = 2↑↑4
7↑↑2 through 11↑↑2
3↑↑↑2 = 3↑↑3 = 3↑27 = 7625597484987
12↑↑2, etc.; 4↑↑3 through 80↑↑3; and 3↑↑4
4↑↑↑2 = 4↑↑4 = 4↑4↑4↑4 ~ 108.0723×10153
all the rest of the a↑↑b in the list above
5↑↑↑2 = 5↑↑5 = 5↑5↑5↑5↑5 ~ 10101.33574×102184
2↑↑↑4 = 2↑↑(2↑↑(2↑↑2)) = 2↑↑(2↑↑4) = 2↑↑16, a tower of height 16 (or 10↑10↑...6.03×1019727 with eleven 10's at the beginning, which in Hypercalc is written "11pt6.03×1019727")
3↑↑↑3 = 3↑↑(3↑↑3), a tower of height 7625597484987
4↑↑↑3 = 4↑↑(4↑↑4), a tower of height 108.0723×10153
5↑↑↑3 = 5↑↑(5↑↑5), a tower of height 10101.33574×102184
6↑↑↑3 = 6↑↑(6↑↑6), a tower of height 3pt2.0692×1036305
7↑↑↑3 = 7↑↑(7↑↑7), a tower of height 4pt3.177×10695974
8↑↑↑3 = 8↑↑(8↑↑8), a tower of height 5pt5.43×1015151335
9↑↑↑3 = 9↑↑(9↑↑9), a tower of height 6pt4.09×10369693099
10↑↑↑3 = 10↑↑(10↑↑10), a tower of height 7pt1010000000000
.. 8↑↑↑3 through 13↑↑↑3
2↑↑↑5 = 2↑↑(2↑↑↑4), a tower of height 2↑↑16 ~ 11pt6.03×1019727
14↑↑↑3 = 14↑↑(14↑↑14), a tower of height 12pt1.2735782×1016
.. 15↑↑↑3 through 7625597484980↑↑↑3 and (perhaps 7625597484981↑↑↑3)
3↑↑↑4 = 3↑↑(3↑↑↑3), a tower of height 3↑↑↑3 ~ 7625597484984pt3638334640023.8
4↑↑↑4 = 4↑↑(4↑↑↑3), a tower of height 4↑↑↑3
.. 5↑↑↑4 through 13↑↑↑4
2↑↑↑6 = 2↑↑(2↑↑↑5), a tower of height 2↑↑↑5
.. 14↑↑↑4 through 7625597484980↑↑↑4 ...

Once again a pattern emerges: except when a is 2, the ordering is determined first by b and then a. It shouldn't be hard to believe that the same thing happens again for a↑↑↑↑b, a↑↑↑↑↑b, and so on for larger numbers of arrows. The exception when a is 2 really continues all the way, for example:

2↑↑↑↑3 = 2↑↑↑(2↑↑↑2) = 2↑↑↑4, a tower of height 16,
but 3↑↑↑↑2 = 3↑↑↑3 = 3↑↑(3↑↑3) = 3↑↑(3↑3↑3) = 3↑↑(3↑27), a tower of height 327, which is much larger

And so we have the:

General Rule for Partial Ordering of the hyper Operator:

If a, b, c, x, y and z are all "of reasonable size", then with few exceptions, when comparing hy(a,b,c) to hy(x,y,z):
  the one with more arrows (b versus y) is larger;
  when b = y, the one with the larger number on the right (c versus z) is larger;
  when b=y and c=z, the one with the larger number on the left (a versus x) is larger.

Detailed Rules for Partial Ordering of the hyper Operator:

When comparing hy(a,b,c) to hy(x,y,z):
  if a = x = 1, they are equal,
  if a is 1 and x is larger, then hy(x,y,z) is larger
  if x is 1 and a is larger, then hy(a,b,c) is larger
  if a = c = x = z = 2, they are equal,
  if y is larger than b, then hy(x,y,z) is larger
  if b is larger than y, then hy(a,b,c) is larger
  if b and y (the number of up-arrows) is the same, and if a and x are both larger than 2, then hy(a,b,c) is larger if c is larger than z, or hy(x,y,z) is larger if z is larger than c

Proof Becomes Difficult

At this point we begin to encounter functions and definitions that are difficult to compare to one another, either because they are not very thoroughly worked out, or because it takes so much work to actually prove whether one grows more quickly than another.

However, in many cases we know how to compare things because someone has done the work to reconcile a given invented system with a rigidly specified function hierarchy.

Gödel Numbers

The Gödel number of G, Gödel's undecidable sentence, is probably around here somewhere (its value depends highly on what operators, functions, etc. are available to construct primitive-recursive statements in the formalised number theory system that the Gödel technique is applied to).

Length of Goodstein Sequences

For many years I had this topic here, but it has been moved to here, which is a lot closer to where it belongs.

Other Triadic Operators

A common trick that clearly generates faster-growing functions involves defining functions that take more than two arguments. We have seen how the hyper operator, our first triadic operator, easily covers everything all the dyadic operators can handle. This trend continues. Of course, all operators can be referred to as functions, and the dyadic operators are actually functions with two arguments.

The Steinhaus-Moser-Ackermann operators

The Ackermann function and the Steinhaus-Moser notation are both equivalent to a triadic operator that is somewhat more powerful than the hy(a,b,c) function above. The Ackermann function and Steinhaus-Moser are roughly equivalent to each other so we'll discuss them together.

Ackermann's Function

A recursive function first described by W. Ackermann in 1928 to demonstrate a property of computability in the field of mathematics, and also used more recently as an example of pathological recursive functions in computer science. There are many different versions of the function; for a complete description of each go here. I will use the version that is the simplest to convert to the hyper operators:

ack-rm(a,b) = { 2b for a = 1 { { 2 for b = 1, a > 1 { { ack-rm(a-1, ack-rm(a,b-1)) for a,b > 1

which yields to analysis as follows:

ack-rm(1,b) = 2b ack-rm(a,1) = 2 ack-rm(2,b) = ack-rm(1, ack-rm(2,b-1)) = 2*ack-rm(2,b-1) and by induction, ack-rm(2,b) = 2^b ack-rm(3,b) = ack-rm(2, ack-rm(3,b-1)) = 2^ack-rm(3,b-1) and by induction, ack-rm(3,b) = 2^{(#4#)}b ack-rm(4,b) = ack-rm(3, ack-rm(4,b-1)) = 2^^ack-rm(4,b-1) and by induction, ack-rm(4,b) = 2^{(#5#)}b and by induction, ack-rm(a,b) = hy(2,a+1,b)

The example value most commonly cited is ack-rm(3,5), 25 which is 265536 ≈ 2×1019728, a large class-2 number. Of course, as with Steinhaus-Moser notation it is easy to transcend the classes entirely.

At this point it is tempting to try to avoid the "triadic function requirement" noted above by defining a single-variable function, such as:

a1(n) = ack-rm(n,n)

While it seems that a1(n) grows "just as fast" as the ack-rm() function, this is not actually true. Each value of the first argument a in ack-rm{a,b) corresponds to a different finite ordinal in the fast growing hierarchy, while ack-rm(n,n) eventually exceeds all of those. This is similar to how x2 eventually exceeds the linear functions kx for any constant k. So a1(n) grows faster than all of the fk(n) functions for finite k, and instead matches the ω-indexed function fω(n).

a1(n) is a convenient way of defining large numbers as a function of one variable, but actually computing those numbers involves the recursive definition of the function. When x>1, we have:

a1(x) = ack-rm(x,x) = ack-rm(x-1, ack-rm(x,x-1))

The problem here is that the arguments of the two ack-rm functions on the right are not equal to each other, and therefore we can't substitute from the definition of a1(n) to put the right side in terms of the a1() function. So this means you always need the two-argument version in order to actually get anywhere: the growth rate of the one-argument a1(x) depends on the existence of a two-argument function.

However, as seen above it is possible to reduce the Ackermann function to two arguments. Furthermore, it is the fastest-growing function you can get using two arguments, if the function is defined only in terms of calls to itself and a "successor function" f(x)=x+1.


First page . . . Back to page 2 . . . Forward to page 4 . . . Last page (page 9)



Japanese readers should see: 巨大数論 (from @kyodaisuu on Twitter)

If you like this you might also enjoy my numbers page.


Robert Munafo's home pages on HostMDS   © 1996-2022 Robert P. Munafo.
aboutcontact
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here.
s.27