# Large Numbers

First page . . . Back to page 4 . . . Forward to page 6 . . . Last page (page 11)

## The Mega and the Moser

These numbers were constructed by Hugo Steinhaus and Leo Moser (in the late 1960's or earlier, exact dates unknown) just to show that it is easy to create a notation for extremely large numbers.

In the earlier "Steinhaus polygon notation", there were 3 special symbols: a triangle, a square, and a circle.

X inside a triangle equals X^{X}

X inside a square equals X inside X concentric triangles

X inside a circle equals X inside X concentric squares

Steinhaus called 2 inside a circle "Mega" and 10 in a circle "Megiston".

(These descriptions are in the 1969 edition of the book "Mathematical Snapshots"; I learned about Megiston at a university library in June 1979.)

Later, in what is now called "Steinhaus-Moser notation", the circle was replaced with a pentagon and higher-order polygons (hexagons, etc.) were added to the system.

As before, X inside a triangle equals X^{X}

As before, X inside a square equals X inside X concentric triangles

X inside a pentagon equals X inside X concentric squares

X inside a hexagon equals X inside X concentric pentagons

and so on.

The "Mega" is now represented by 2 inside a pentagon, and "Megiston" is 10 inside a pentagon. A new, much larger quantity called "Moser's number" is "2 inside a megagon", where a "megagon" is a polygon with a Mega sides.

Here is the notation in functional form:

sm(a,n,p) = { a^a for n = 1, p = 3 { { sm(a,a,p-1) for n = 1, p > 3 { { sm(sm(a,1,p),n-1,p) for n > 1and here are a few values:

sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256 sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) = sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) = sm((256^256)^(256^256),254,3) = sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston = sm(10,1,5) moser = sm(2,1,Mega)We can approximate the sm function in terms of the hy function for small values of p:

sm(n,1,3) = n^n = hy(n,3,n) = hy(n,4,2) sm(n,2,3) = (n^n)^(n^n) = n^(n*n^n) = n^(n^(n+1)) ~= hy(n,4,3+epsilon) sm(n,3,3) = sm(n,2,3)^sm(n,2,3) = (n^(n^(n+1)))^(n^(n^(n+1))) = n^(n^(n+1) * n^(n^(n+1))) = n^(n^(1+n+n^(n+1))) ~= n^(n^(n^(n+1+epsilon))) ~= hy(n,4,4+epsilon) by induction sm(n,x,3) ~= hy(n,4,x+1+epsilon) sm(n,1,4) = sm(n,n,3) ~= hy(n,4,n+1+epsilon) sm(n,2,4) = sm(sm(n,1,4),1,4) ~= sm(hy(n,4,n+1+epsilon),1,4) ~= hy(hy(n,4,n+1+epsilon),4,hy(n,4,n+1+epsilon)+1+epsilon) "intuitively", sm(n,1,4) ~= hy(n,4,n+1+epsilon) sm(n,1,5) ~= hy(n,5,n^2)The value of Mega can be computed by hypercalc's internal BASIC interpreter with the following code:

10 let mega = 256; 20 for n=1 to 256; 40 let mega = mega ^ mega; 80 next n 160 print "Mega = ", mega 320 endThe first few steps of this generate the numbers:

256 = 2^{8} = 2^{23}

256^{256} = 2^{2048} = 2^{211} ≈ 3.231700607153×10^{616}

(256^{256})^{(256256)} = 2^{22059} ≈ 10^{(1.992373902866×10619)}

[(256^{256})^{(256256)}]^{[(256256)(256256)]} =
2^{22059+22059} ≈ 10^{10(1.992373902866×10619)}

Each time through the loop there are twice as many 256's — so there
are 2^{256} 256's in mega. However, the parentheses are grouped
differently from the power towers discussed above. After two times
through the loop, for example, it's (256^{256})^(256^{256}). That is
not as big as 256^{(256(256256))} — the former is
10^{(1.992373902866×10619)}, the latter is
10^{[10(7.78271055807×10616)]}. This discrepancy continues,
with the result that the mega is about
10^(10^(10^...^(1.992373902866×10^{619})...)), with 255 10's before
the "1.992373902866×10^{619}" part. For reasons explained
here, that is equivalent to what you get if you replace
all but the last few 10's with nearly any other number between say 2
and 1000. hypercalc's final answer is:

255 PT ( 1.992373902865×10^{619} )

which represents a power tower with 255 powers of 10 ("255 P.T.") and
1.992373...×10^{619} at the top.

## Fast Growing Hierarchies

We have a lot of larger finite and well-defined numbers to come. As we proceed it becomes steadily harder to put them in ascending order. This is where it becomes really useful to use a formally-defined function hierarchy, as discussed earlier.

The (unfortunately somewhat poorly-named) phrase "fast growing hierarchy" refers to a family of construction and proof techniques that together can be used to demonstrate a lot of useful results that are important for working out the relative sizes of the large numbers to be discussed below.

Many examples of function hierarchies are possible, depending on the purpose for which they are to be used. An often-used example is the Hardy hierarchy, which begins (using n and m to refer to finite integers):

H_{0}(n) = n

H_{1}(n) = n+1

H_{2}(n) = n+2

H_{m}(n) = n+m

H_{ω}(n) = H_{n}(n) = 2n

H_{ω+1}(n) = H_{ω}(n+1) = 2n+2

H_{ω+2}(n) = H_{ω+1}(n+1) = H_{ω}(n+2) = 2n+4

H_{ω+m}(n) = 2n+2m

. . .

H_{ω×2}(n) = 4n

H_{ω×3}(n) = 8n

H_{ω×m}(n) = 2^{m}×n

H_{ω2}(n) = 2^{n}×n

. . .

H_{ω3}(n) ≈ n↑↑{n}

H_{ω4}(n) ≈ n↑↑↑{n}

H_{ω5}(n) ≈ n↑↑↑↑{n}

H_{ωω+1}(n) ≈ {n,n,n} = n{n}n

. . .

In the last line here, "n{n}n" is using the Bowers abbreviation for n up-arrows, and "{n,n,n}" is a 3-element Bowers array.

Another function hierarchy popular with "googologists" is commonly called "the" fast-growing hierarchy, and often that name is used without regard to the fact that there is no single definition of a structure to derive all the functions that the googologists are trying to use under that name. Nevertheless, there is agreement on how the hierarchy begins, and it relates closely to the Goodstein sequences we will discuss next. Here is how it begins:

f_{0}(n) = n+1

f_{1}(n) = f_{0}^{n}(n) = n×2

f_{2}(n) = f_{1}^{n}(n) = n×2^{n}

f_{3}(n) = f_{2}^{n}(n) = n×2^{n}×2^{n×2n}...
(with n terms in the series)

f_{4}(n) ≈ n↑↑↑n

f_{5}(n) ≈ n↑↑↑↑n

. . .

These functions fall into a pattern that should look familiar:

*f_{0}() performs addition K+n,

f_{1}() performs multiplication K n,

f_{2}() slightly exceeds performs exponentiation K^{n},

f_{3}() slightly exceeds tetration or "hyper4" k↑n

. . .

And the next functions after that are equivalent in growth rate to
the higher "hyper" operators. We also see that the f() functions with
integer subscripts f_{i}() are similar to the Hardy functions with
subscript ω^{i}.

As in our earlier examples there are an infinite number of functions with integer indices like this followed by many more with cardinal infinite indices.

This function hierarchy is quite well described in the Numberphile video TREE vs Graham's number.

That's enough on this for now, but we'll revisit this rich topic later.

## Goodstein Sequences

Reuben Goodstein in 1944 proved [32] a remarkable theorem about the finite limit of an iterative process that seems to most observers to be destined to go on forever.

Higher than any of the finite-indexed f_{n}() functions, but not
quite as high as the single-argument Ackermann function, is gw(n),
the length of the weak Goodstein sequence starting with n. After we
discuss that we'll move on to other things, and eventually come back
to Goodstein sequences again using the "strong" definition.

"Goodstein sequence of n" = the sequence of integers constructed as in Goodstein's Theorem from a starting value of n.

The length of a Goodstein sequence, and the highest value achieved, are both guaranteed to be finite, and are given by a fast-growing function in n. Two definitions are common:

gw(n) = highest value in the Goodstein sequence of n.

gw(n) = number of terms in the Goodstein sequence of n (counting from the initial term equal to n and the first term equal to 0); or equivalently: the value of the base b of the term with a value of 1.

There are two ways to do this, "weak" and "strong". We'll start with the "weak" version, where the numbers grow less quickly and the sequence gets to 0 sooner.

Start with a number, like 7; and start with base 2. Write out the number "in base 2" by breaking it up into a sum of "digits" multiplied by powers of 2:

7 = 1×2^{2} + 1×2^{1} + 1×2^{0}

Now "increase the base to 3" by changing the 2's to 3's (but leave the exponents alone):

1×3^{2} + 1×3^{1} + 1×3^{0}

Now interpret this as a number, which is the new (larger) value:

1×3^{2} + 1×3^{1} + 1×3^{0} = 13

Now subtract one:

12 = 1×3^{2} + 1×3^{1} + 0×3^{0}

12 is the second term in the sequence, and 3 is the base.

Now repeat the process: Change the 3's to 4's, compute a new value, and subtract one:

1×4^{2} + 1×4^{1} + 0×4^{0}

1×4^{2} + 1×4^{1} + 0×4^{0} = 20

19 = 1×4^{2} + 0×4^{1} + 3×4^{0} ←*

Repeating this a few more times, we get:

27 = 1×5^{2} + 0×5^{1} + 2×5^{0}

37 = 1×6^{2} + 0×6^{1} + 1×6^{0}

49 = 1×7^{2} + 0×7^{1} + 0×7^{0}

63 = 7×8^{1} + 7×8^{0} ←*

Clearly every time we make a new term in the sequence, the base is increasing by one. This means that if we want to know the length of the sequence we can just look at the value of the base. We'll do this because it makes it easier to work out how long these sequences will be. It also means:

Every time the b^{0} term of the sum decreases, b becomes b+1

We're just adding one, but we will say that there is a function f(b) = b+1. Expressing it that way is more important than it may seem at the moment. Let's do the next 8 steps:

69 = 7×9^{1} + 6×9^{0}

75 = 7×10^{1} + 5×10^{0}

81 = 7×11^{1} + 4×11^{0}

87 = 7×12^{1} + 3×12^{0}

93 = 7×13^{1} + 2×13^{0}

99 = 7×14^{1} + 1×14^{0}

105 = 7×15^{1} + 0×15^{0}

111 = 6×16^{1} + 15×16^{0} ←*

Also, at this point, notice the lines marked with "←*". These are
times where the rightmost "digit", the b^{0} term of the sum, has
switched from 0×b^{0} to (b-1)b^{0}. Notice that the base is 4
the first time, and it is 8 the next time, and then 16 the time after
that. It doubles each time.

Here is another example, starting with 27 (in base 2, 11011_{2})
and writing things more concisely with digits
and the base as a subscript:

11011_{2} = 27_{10}

11011_{3} = 112_{10}. 11010_{3} = 111_{10}

11010_{4} = 324_{10}. 11003_{4} = 323_{10} ←*

11003_{5} = 753_{10}. 11002_{5} = 752_{10}

11002_{6} = 1514_{10}. 11001_{6} = 1513_{10}

11001_{7} = 2745_{10}.

11000_{8} = 4608_{10}. 10777_{8} = 4607_{10} ←*

10777_{9} = 7198_{10}.

10776_{10} = 10776

10775_{11} = 15570_{10}

10774_{12} = 21832_{10}

10773_{13} = 29838_{10}

10772_{14} = 39888_{10}

10771_{15} = 52306_{10}

10770_{16} = 67440_{10}. 1076F_{16} = 67439_{10} ←*

1076F_{17} = 85661_{10}

...

Again the lines marked with "←*" show where the lowest digit was a zero, and again the base doubles from 4 to 8 and then to 16.

This doubling is a universal phenomenon; it happens in any Goodstein
sequence even if we were to start with a base other than 2. The reason
is pretty easy to see: When it is 0×b^{0}, we're going to need to
subtract one from the next "digit" to the left, which takes one step
to do, and then we have b-1 in the rightmost "digit" and it will
therefore take another b-1 steps to get this down to zero. So we
spend exactly b steps, and during this time the base goes up from
b to 2b.

Every time we go from ... + k×b^{1} + 0×b^{0}
to ... + (k-1)×b^{1} + 0×b^{0}, b becomes 2b

This time there is a function g(b) = 2b, which is equivalent to repeating the function f(b) = b+1 a total of b times.

For similar reasons, the number of steps needed each time a b^{2}
digit decreases are expressed by a function h(b) that is a
repetition of the g() function:

Every time we go from ... + k×b^{2} + 0×b^{1} + 0×b^{0}
to (k-1)×b^{2} + 0×b^{1} + 0×b^{0},
b becomes 2^{b}b.

The new function h(b) is 2^{b}b, which is exactly what you
expect to get if you start with b and double it b times.

The f(), g(), and h() functions we have defined so far fall into a pattern that should look familiar:

f(n) = n+1 = f_{0}(n)

g(n) = 2n = f_{1}(n)

h(n) = 2^{n}n = f_{2}(n)

These are none other than the first three functions of the fast-growing hierarchy described earlier.

Of course, the next-higher digits will grow the sequence length at
a rate like that of tetration and the higher "hyper" operators. In
general, if our starting value is 2^{n} then the sequence will start
with 111111_{3} (with n 1's) and the length of the sequence will
involve the n^{th} hyper operator. Very approximately,

gw(2^{n}) ≈ 2↑↑...↑(2^{n}) (with n-2 arrows)

= hy(2, n, 2^{n})

The much more slowly growing "Goodstein numbers" g_{n}(n), as seen
in Sloane's sequence A266202, considers the n^{th} term in the
"Goodstein sequence of n" as defined above. So for example, when we
started with 7 and iterated 7 times we had 69 = 7×9^{1}, so 69 is
the 7^{th} member of sequence A266202 (the starting 0 is
considered the 0^{th} member of the sequence). The corresponding
sequence for the "strong" definition of Goodstein iteration is
A266201, and there the 7^{th} term is 37665879.

For more, you can skip ahead a bit to Goodstein sequences (strong).

First page . . . Back to page 4 . . . Forward to page 6 . . . Last page (page 11)

Japanese readers should see: 巨大数論 (from @kyodaisuu on Twitter)

If you like this you might also enjoy my numbers page.

s.27