# Large Numbers

First page . . . Back to page 3 . . . Forward to page 5 . . . Last page (page 7)

## The Mega and the Moser

These numbers were constructed by Hugo Steinhaus and Leo Moser (in the late 1970's or earlier, exact date unknown) just to show that it is easy to create a notation for extremely large numbers. In the special "Steinhaus-Moser notation", numbers can have polygons (like triangles or squares) drawn around them:

X inside a triangle equals X^{X}

X inside a square equals X inside X concentric triangles

X inside a pentagon equals X inside X concentric squares

and so on.

2 inside a pentagon is called mega, and 2 inside a mega-gon is called moser.

(In an earlier version, by Steinhaus before Moser got involved, the pentagon is replaced by a circle, so mega is 2 inside a circle, and there are no higher polygons so moser is unrepresentable. Steinhaus also defines megiston as 10 inside a circle. moser is much bigger than megiston. Both versions of the notation predate 1983; I learned about the moser at an academic library in June 1979.)

Here is the notation in functional form:

sm(a,n,p) = { a^a for n = 1, p = 3 { { sm(a,a,p-1) for n = 1, p > 3 { { sm(sm(a,1,p),n-1,p) for n > 1and here are a few values:

sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256 sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) = sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) = sm((256^256)^(256^256),254,3) = sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston = sm(10,1,5) moser = sm(2,1,Mega)We can approximate the sm function in terms of the hy function for small values of p:

sm(n,1,3) = n^n = hy(n,3,n) = hy(n,4,2) sm(n,2,3) = (n^n)^(n^n) = n^(n*n^n) = n^(n^(n+1)) ~= hy(n,4,3+epsilon) sm(n,3,3) = sm(n,2,3)^sm(n,2,3) = (n^(n^(n+1)))^(n^(n^(n+1))) = n^(n^(n+1) * n^(n^(n+1))) = n^(n^(1+n+n^(n+1))) ~= n^(n^(n^(n+1+epsilon))) ~= hy(n,4,4+epsilon) by induction sm(n,x,3) ~= hy(n,4,x+1+epsilon) sm(n,1,4) = sm(n,n,3) ~= hy(n,4,n+1+epsilon) sm(n,2,4) = sm(sm(n,1,4),1,4) ~= sm(hy(n,4,n+1+epsilon),1,4) ~= hy(hy(n,4,n+1+epsilon),4,hy(n,4,n+1+epsilon)+1+epsilon) "intuitively", sm(n,1,4) ~= hy(n,4,n+1+epsilon) sm(n,1,5) ~= hy(n,5,n^2)The value of mega can be computed by hypercalc's internal BASIC interpreter with the following code:

10 let mega = 256; 20 for n=1 to 256; 40 let mega = mega ^ mega; 80 next n 160 print "Mega = ", mega 320 endThe first few steps of this generate the numbers:

256 = 2^{8} = 2^{23}

256^{256} = 2^{2048} = 2^{211} ≈ 3.231700607153×10^{616}

(256^{256})^{(256256)} = 2^{22059} ≈ 10^{(1.992373902866×10619)}

[(256^{256})^{(256256)}]^{[(256256)(256256)]} =
2^{22059+22059} ≈ 10^{10(1.992373902866×10619)}

Each time through the loop there are twice as many 256's — so there
are 2^{256} 256's in mega. However, the parentheses are grouped
differently from the power towers discussed above. After two times
through the loop, for example, it's (256^{256})^(256^{256}). That is
not as big as 256^{(256(256256))} — the former is
10^{(1.992373902866×10619)}, the latter is
10^{[10(7.78271055807×10616)]}. This discrepancy continues,
with the result that the mega is about
10^(10^(10^...^(1.992373902866×10^{619})...)), with 255 10's before
the "1.992373902866×10^{619}" part. For reasons explained
here, that is equivalent to what you get if you replace
all but the last few 10's with nearly any other number between say 2
and 1000. hypercalc's final answer is:

255 PT ( 1.992373902865×10^{619} )

which represents a power tower with 255 powers of 10 ("255 P.T.") and
1.992373...×10^{619} at the top.

## Friedman Sequences

In a 1998 paper^{14}, Harvey Friedman describes the problem of
finding the longest sequence of letters (chosen from a set of N
allowed letters) such that no subsequence of letters i through 2i
occurs anywhere further on in the sequence. For 1 letter {A} the
maximum length is 3: AAA. For 2 letters {A, B} the longest
sequence is 11: ABBBAAAAAAA. For 3 letters {A, B, C} the
longest sequence is very very long, but not infinite.

He then goes on to show how to construct proofs of lower bounds for
N-character sequences using certain special (N+1)-character
sequences. With help from R. Dougherty, he found a lower bound for the
N=3 case, A_{7198}(158386) = ack-rm(7198,158386) =
ack(7198,2,158386) = hy(2,7199,158386) = 2^{(7199)}158386, where
x^{(7199)}y represents the 7199^{th}
hyper operator.^{13}

This value is less than the Graham-Rothschild number and the other Graham's numbers (which we'll go into next). However, the N=4 case gives a result that is immensely bigger than all the versions of Graham's number. Friedman describes these relations at [46].

## The Graham-Rothschild Number

The original genuine "Graham's number", from a 1971 paper by Graham and Rothschild [34], is an upper bound for a problem in Ramsey theory, (graph coloring, combinatorics).

The problem is to determine the lowest dimension of a hypercube such
that if the lines joining all pairs of corners are two-colored, a
planar graph K_{4} of one color will be forced. (K_{4} is a
totally-connected graph with 4 edges, topologically equivalent to the
4 vertices and 6 edges of a tetrahedron. The planar requirement
means that its 4 vertices are in one plane, meaning that it can't be
just any 4 vertices of the hypercube, but diagonal planes are okay.
See the article by Sbiis Saibian [51] for an extremely
thorough description with lots of illustrations.)

In 1971 Graham and Rothschild proved that an answer exists using an upper bound F(F(F(F(F(F(F(12,3),3),3),3),3),3),3), where F(n) is defined:

F(m,n) = { 2^n for m=1, n>=2 { { 4 for m>=1, n=2 { { F(m-1,F(m,n-1)) for m>=2, n>=3

Graham and Gardner's 1971 description of the number |

Using the generalized hyper function, F(F(F(F(F(F(F(12,3),3),3),3),3),3),3) is equivalent to lgn(7), where lgn(n) is defined:

lgn(n) = { hy(2,12,3) for n = 1 { { hy(1, lgn(n-1), 3) for n > 1In other words, using up-arrows:

lgn(1) = 2↑↑↑↑↑↑↑↑↑↑↑↑3 (12 up-arrows)
= hy(2, 12, 3)

lgn(2) = 2↑↑↑...↑↑↑3 (with lgn(1) up-arrows)

lgn(3) = 2↑↑↑...↑↑↑3 (with lgn(2) up-arrows)

. . .

lgn(7) = 2↑↑↑...↑↑↑3 (with lgn(6) up-arrows)

This is the original number from Graham's proof.

As related by mathematician John Baez [50], Martin Gardner wanted to describe it the Mathematical Games column of Scientific American, but Graham and Gardner agreed that the definition was a little too complex, and so they invented an even larger number...

## The Graham-Gardner Number

This is the number commonly known as "Graham's number", but it is
more attributable to Martin Gardner than to Graham. The number
from the 1971 paper by Graham and Rothschild, described above, is an
"upper bound". The rule for upper bounds is that they are "true" so
long as the thing they're supposed to be "bounding" is provably
smaller. That means you can use a larger number for the upper bound,
and it's true too. If I tell you that 2^{2×3}=64 is an upper bound
for my problem, then it's safe for you to tell someone else that
2^{23}=256 is an upper boud for the problem, because 256 > 64.

So Gardner and Graham came up with the definition for a larger upper bound, which was popularized by Martin Gardner in 1977. This came to be known as "Graham's number", but I call it the Graham-Gardner number. Its value is is gn(64), where gn() is defined as follows:

gn(n) = { hy(3,6,3) for n = 1 { { hy(3, gn(n-1)+2, 3) for n > 1

This illustration from the Graham's number Wikipedia article is a popular way to illustrate the definition concisely:

The "curly braces" indicate that the number of up-arrows in each "layer" is counted by the number immediately below, with 4 arrows in the bottom layer.

Todd Cesere and Tim Chow have both proven that the Graham-Gardner Number is bigger than the Moser, and in fact even gn(2) is much bigger than the Moser. Tim's proof is outlined on a page by Susan Stepney, here.

Since gn(1) is 3↑↑↑↑3, which is smaller than 2↑↑↑↑↑↑↑↑↑↑↑↑3, it follows that the Graham-Rothschild Number is also bigger than Moser:

Moser << Graham-Rothschild << Graham-Gardner << Graham-Conway

This last variant, "Graham-Conway" is probably the least-known, but also the largest...

## The Graham-Conway Number

Conway and Guy's book The Book of Numbers [42] includes a description of a "Graham's number" which is inexact, and which differs from the other "Graham's number" definitions above. The exact text from [42] is:

Graham's number is
4↑↑ . . . ↑↑4, where the number of arrows is 4↑↑ . . . ↑↑4, where the number of arrows is . . . et cetera . . . (where the number of lines like that is 64) It lies between 3→3→64→2 and 3→3→65→2. |

(The last bit with the right-arrows is using Conway's Chained Arrow Notation). The problem with this description is that it doesn't tell you how many up-arrows you should start with. Sbiis Saibian has a thorough page on Graham's number [51] and calls this the "Graham-Conway number". He makes an educated guess that the number of up-arrows on the last 4↑..↑4 line should be the same as in the Graham-Gardner definition, i.e. 4 up-arrows:

Graham-Conway Number = Gc(64)

where Gc(n) is defined for all positive integer n as follows:

Gc(1) = 4↑↑↑↑4

Gc(n+1) = 4↑↑...↑↑4 with Gc(n) up-arrows

With this clarification, the Graham-Conway number is just like the Graham-Gardner number except that you start with 4↑↑↑↑4 instead of 3↑↑↑↑3. This definition has a sort of elegance in that everything is 4's, even the number of lines can be expressed as 64=4×4×4 is you're really trying to use 4's everywhere.

Clearly the Graham-Conway number is larger than the others, and it's the largest "Graham's number" I've seen, notwithstanding xkcd:

*How to horrify mathematicians*

This strip is using a 2-argument Ackermann function. There are several two-argument definitions in my listing Versions of Ackermann's Function, and we could choose any of them, so we may as well choose one that is easily turned into up-arrows, such as the one by Buck in 1963. This gives A(gn(64),gn(64)) = 2↑↑...↑↑gn(64), where the number of up-arrows is gn(64)-2. Thus it's "a little bigger" than gn(65). If xkcd wanted to horrify mathematicians a little more, they might be better off using a "single-argument" definition like A1(x)=A(x,x), then putting the A1 inside one of the gn()'s, like this:

gn(A1(gn(64))) = AAUGHHHH!!!

## Superclasses

Now let's take a break from the specific examples and functions for a moment to describe another type of "number class" that I have seen recently in the way people describe large numbers. We'll call these superclasses.

I first saw the concept in Eliezer Yudkowsky's description^{8}
of the Graham-Gardner number,
apparently adapted from the very similar
description by Martin Gardner^{16}. I'm adapting it further to
illustrate the superclasses.

(1) Let's start with 3↑3. This is 3^{3} = 27,
a number that is small enough for anyone to visualize.

(2) Next, consider 3↑↑3 = 3^{④}3. This is 3^{33} = 3^{27}
= 7625597484987, about 7 trillion. Even 7
billion would be too big for anyone to visualize, but most people
can understand it as being comparable to something familiar. (For
example, it is 1000 times more than the
world population but about a tenth the
number of cells in a human body.)

(3) 3↑↑↑3 or 3^{⑤}3, is 3^{④}(3^{④}3), which is
3^{④}7625597484987. That would be a power tower of 3's, 7 trillion
levels high. Now we are far beyond the ability to understand —
nothing in the real world is this numerous, and even combinations of
things (such as the number of ways to
shuffle every particle in the
known universe) only require, at most, 4 or 5 levels of a power-tower
to express. However, even though the quantity is beyond
understanding, the procedure for computing it can be visualized:
Start with n=1. Now replace n with 3^{n}. Replace n with
3^{n} again, and so on — repeat seven trillion times. This is a
procedure that one can visualize doing, and it can even be completed
in a reasonable amount of time using a fast computer and a suitable
representation format such as level-index.

(4) Now consider 3↑↑↑↑3, or 3^{⑥}3, the first step in
the Graham-Gardner Number definition. This is
3^{⑤}(3^{⑤}3), which is
3^{④}(3^{④}(3^{④}(...^{④}(3^{④}3))...)), a chain
of 7 trillion 3's and hyper4 operators. Every one of these
requires performing the exponent operation an unimaginable number of
times. So now, the procedure is beyond human ability to visualize,
although it can be understood. Start with n=3. Now replace n
with an exponential tower of threes of height n. Repeat
3^{⑤}3 - 2 more times, where 3^{⑤}3 is the huge number
whose calculation we visualized in the previous paragraph!
Martin Gardner, also writing about the
Graham-Gardner number, said:

3↑↑↑↑3 is unimaginably larger than 3↑↑↑3, but it is still small as finite numbers go, since most finite numbers are very much larger.

(5) We have just seen four numbers in a sequence: 3↑3, 3↑↑3, 3↑↑↑3, and 3↑↑↑↑3. Now consider the formula for the Graham-Gardner number. Begin with x = 3↑↑↑↑3, the number whose calculation procedure cannot even be visualized — then increase the number of up-arrows from 4 to x. Then increase the number of up-arrows to this new, larger value of x again. Then — repeat 61 more times! Here's what Yudkowsky had to say:

Graham's number is far beyond my ability to grasp. I can describe it, but I cannot properly appreciate it. (Perhaps Graham can appreciate it, having written a mathematical proof that uses it.) This number is far larger than most people's conception of infinity. I know that it was larger than mine. ...

This example illustrates the numbers (including those on my numbers page and on this page up to this point) can be seen as going from one level of abstraction to another. Here are the definitions of the superclasses:

Superclass 1: The number can be visualized. (Example: 27)

Superclass 2: The number cannot be visualized but it can be understood. (Example: 3↑↑3 ≈ 7 trillion)

Superclass 3: The number cannot be understood, but the procedure for computing it can be visualized. (Example: 3↑↑↑3)

Superclass 4: The procedure cannot be visualized, but the procedure can be understood. (3↑↑↑↑3)

Superclass 5: The procedure for generating the number is so abstract that it cannot be understood (by whoever's talking, by the author, by yourself, by some given group or people, etc.).

To complete the analogy to the normal classes, I will also add:

Superclass 0: The number can be experienced (whatever that means) by unsentient animals. (Example: 2)

Like the classes, the superclasses have important qualities that distinguish them, and this insight is what makes "superclass" a useful distinction. Each requires an additional amount of mental effort to ensure that the number is understood well anough to answer questions such as which number is larger, or which algorithm grows faster.

Notice that the first three (superclass 0, 1 and 2) are roughly equivalent to class 0, class 1 and class 2 above. The division points between them will be similar for most readers, but the upper range of superclass 2 will probably vary from one reader to another. I think I "understand" numbers about as big as the size of the visible universe, although I find it harder on some days than on others. There are probably people who have an understanding of such things as the size of the universe after inflation, or the size that would be needed for a spontaneous origin of life.

The first three superclasses can all be calculated fairly easily, using familiar techniques or modest towers of exponents — in the latter case sizes can easily be compared by looking at the number of levels in the tower and the value of the number at the top. Practical tools like hypercalc exist to actually work with these numbers.

Superclass 3 begins wherever your ability to "understand" ends.
However, the procedure for computing it can still be visualized.
Somewhere within superclass 3, the practical tools like hypercalc
stop working. At the top end of superclass 3, we are well beyond the
ability to work with actual numbers and have to start using symbols
and words, because the numbers outstrip the available tools. However,
it is still possible to make fairly simple proofs to show how one
function converts to another. An example is proving that the
higer-valued hyper4 operator a^{④}b grows about as quickly as
the lower-valued hyper5 operator a_{⑤}b.

With superclass 4, such proofs become more difficult. This is why it is a bit tougher to compare the Graham-Rothschild number to the Moser. It can be worked out by most who have time and patience, and the explanation can be followed and understood fairly easy, but it probably isn't too easy to remember.

With the Graham-Rothschild number (or its more well-known variant the Graham-Gardner number), and other numbers of similar size and larger which we are about to get into, most readers will have stepped firmly into superclass 5. The last paragraph in the Yudkowsky quote captures what superclass 5 is about: perception and understanding are so difficult that it is hard even to compare the number to infinity. I cannot say where your superclass 5 begins, but once you reach it, it is better to stop trying to grasp the quantities in any direct way and instead use rigorous deduction techniques, or trust others who have done so.

I have found my own boundary between 4 and 5 varies widely from one day to another. Clearly there are those (such as the people who worked on the numbers we are about to discuss) for whom "superclass 4" extends far higher than mine. I suggest it might be useful to define:

Superclass 6: The number is so big that no-one can understand its definition well enough to know anything useful about it.

In other words, superclass 6 is finite, but so large that no person, nor humanity as a whole, stands a chance of discovering anything useful about it. This happens with specific values of the Busy Beaver function beyond a certain point.

## Conway's Chained Arrow Notation

This notation is based on Knuth's up-arrow notation. In this notation, three or more numbers are joined by right-arrows. The arrow is not an ordinary dyadic operator. You can have three or more numbers joined by arrows, in which case the arrows don't act separately, the whole chain has to be considered as a unit. It might be thought of as a function with a variable number of arguments, or perhaps a function whose single argument is an ordered list or vector.

Here is Conway's notation, adapted from Susan Stepney's excellent web page on large numbers:

A. Three-number chains are equivalent to the generalized hyper operator:

a→b→c = hy(a, c+2, b) = a^{(c+2)}b

= a ↑↑..↑↑ b, with c up-arrows

B. A chain with a final 1 is equivalent to the same chain with the 1 removed:

a→b→...→x→1 = a→b→...→x

C. Conway is silent about the meaning of a two-element chain^{7},
but a definition is necessary. The most logical definition combines
the two previous rules and assumes that they work in reverse:

a^{b} = a↑b = a→b→1 (by rule A)

= a→b (by rule B)

therefore the reverse is true: a→b = a↑b = a^{b}

D. The last two numbers in a chain can be dropped if the second-to-last is a 1:

a→b→...→x→1→(z+1) = a→b→...→x

E. The last number in the chain can be reduced in size by 1 by taking the second-to-last number and replacing it with a copy of the entire chain with its second-to-last number reduced by 1:

a→b→...→x→(y+1)→(z+1)

= a→b→...→x→ (a→b→...→x→y→(z+1)) →z

F. If the last number in the chain is 2, rules D and E combine into:

a→b→...→x→ (y+1) → 2

= a→b→...→x→ (a→b→...→x→y→2) →1

= a→b→...→x→ (a→b→...→x→y→2)

= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2) →1 )

= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2))

and in general,

a→b→...→x→ (y+1) → 2

= a→b→...→x→ ( y times

a→b→...→x

) y times

G. A more general expansion of rule E follows:

a→b→...→x→2→ (z+1)

= a→b→...→x→ (a→b→...→x→1→(z+1)) →z

= a→b→...→x→ (a→b→...→x ) →z

a→b→...→x→3→ (z+1)

= a→b→...→x→ (a→b→...→x→2→(z+1)) →z

= a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z

a→b→...→x→4→ (z+1)

= a→b→...→x→ (a→b→...→x→3→(z+1)) →z

= a→b→...→x→ (a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z) →z

and in general,

a→b→...→x→ (y+1) → (z+1)

= a→b→...→x→ ( y times

a→b→...→x

) →z y times

The parentheses can only be removed after the chain inside the parentheses has been evaluated into a single number. Remember, the arrows are all taken together at once. You can't group them to evaluate, you can only reduce terms off the end using one of the rules above. a→b→c→d is not equivalent to (a→b→c)→d, a→(b→c→d), a→(b→c)→d, a→b→(c→d) or anything else like that.

The wikipedia article, Conway chained arrow notation, gives this set of rules which is equivalent to the above but probably a bit easier to use:

1. A single-number "chain" is equal to itself.

2. The chain a→b represents the value a^{b}.

3. If X is any chain (of one or more numbers), the chain X→a→1 is equivalent to X→a.

4a. The chain X→1→b for any value of b, is equivalent to X.

4b. The chain X→a→b for any values of a and b where both are greater than 1, is equivalent to X→(X→(a-1)→b)→(b-1).

4c. By repeating rule 4b we can see that any chain X→a→b with a and b both larger than 1 eventually turns into a chain X→c where "c" is a the value of the recursive construction X→(X→(X→(...(X→1→1)...)))→(b-1))→(b-1) with the number of nested parentheses depending on a.

Regarding 2's at the beginning

Note that 2↑↑↑...↑↑↑2 = 4 regardless of the number of up-arrows. Since any chain starting with 2→2→... will eventually end up as 2→2→c for some large c, it follows that any chain starting with 2→2→... is equal to 4.

Regarding 1's

Any chain starting with 1 eventually ends up as 1→a→b for some a and b, which is 1↑↑↑...↑↑↑a with b arrows. This is just a power of 1, so any chain starting with 1 is equal to 1.

Any chain with a 1 in it someplace else, like a→b→c→1→d→e→f, reduces to a→b→c→1→x for some x, and this then immediately reduces to a→b→c. So if you have anything with a 1 in it, you can remove the 1 and everything after it.

Some Correlations

The rules for Conway chained arrows express them in terms of Knuth's up-arrow notation and recursion; we can also express some of the other functions and numbers discussed above in terms of chained arrows:

Ackermann's function is equivalent to a 3-element chain:

A(m, n) = ( 2→(n+3)→(m-2) ) - 3

The Graham-Gardner number is not directly equal to a simple
chained-arrow representation, but its definition in terms of
the function gn(n) can be re-stated in terms of an interated
function f^{i}(n):

f^{1}(n) = f(n) = 3→3→n = 3↑↑..↑↑3 with n up-arrows

f^{2}(n) = f(f(n)) = 3→3→(3→3→n)

f^{3}(n) = f(f(f(n))) = 3→3→(3→3→(3→3→n))

f^{4}(n) = f(f(f(f(n)))), etc.

which gives the Graham-Gardner number G equal to f^{64}(4) because the
construction of Graham-Gardner starts with 3↑↑↑↑3 = 3→3→4, and we
increase the number of arrows 63 times.

Now start with f^{64}(1), and note that we can reverse rule 4c
above (with "3→3" as the value of X), and show that:

f^{64}(1) = 3→3→(3→3→(...3→3→(3→3→1))...)) with 64 sets of 3→3

= 3→3→(3→3→(...3→3→(3→3)→1)...)→1)→1

= 3→3→64→2

Similarly, we can start with f^{64}(27) and show that:

f^{64}(27) = 3→3→(3→3→(...3→3→(3→3→27))...)) with 64 sets of 3→3

= 3→3→(3→3→(...3→3→(3→3→(3→3)))...))

= 3→3→65→2

Since f^{64}(1) < f^{64}(4) < f^{64}(27), the Graham-Gardner number must be
between 3→3→64→2 and 3→3→65→2. See
here for a somewhat different discussion of
the same thing.

One may speculate on the general problem of determining which is the larger of two chains a→b→...→c and x→y→...→z. We can begin to make answer that question for some of the shorter chains (most of which is simply a restatement and re-ordering of the examples in my partial ordering for Knuth up-arrows):

First, as noted above we can remove any 1's and anything coming after a 1. Also, if the chain starts with 2→2 the whole thing can be replaced with 4.

For two-item chains, the following are clear:

2→3 is smaller than 3→2

2→4 is the same as 4→2

for any other a and b, if b is greater than a then a→b is
greater than b→a.

Now let's look at three-item chains:

For any a, the chain a→2→2 = a↑↑2 = a^{a}. So for these,
the one with a larger a is the larger chain.

If the first two items are the same, as in comparing a→b→c to a→b→d, both are like a↑↑↑...↑↑↑b but one has more arrows. So if a and b are both greater than 2, then the one with the larger d is larger.

Similarly if the first and last items are the same, as in comparing a→b→d to a→c→d, we are comparing two things with the same number of arrows (a↑↑↑...↑↑↑c versus a↑↑↑...↑↑↑d) and clearly the one with the larger number is larger.

A similar argument shows that if we are comparing a→c→d to b→c→d, where only the first number differs, the one with the larger first number is larger. This is a generalization of the a→2→2 case above.

2→3→2 = 2↑↑3 = 2↑2↑2 = 2↑4, but 2→2→3 = 2↑↑↑2 = 2↑↑2 = 2↑2 = 4. Since 2→2→anything is 4, it's clear that if we have two 2's, the arrangement with the larger-than-2 number in the middle is larger. Also, 3→2→2 = 3↑↑2 = 3↑3 = 27, so this is the largest of the three combinations of two 2's and a 3.

There are three ways to combine a 2 and two 3's:

3→3→2 = 3↑↑3 = 3↑27; and 3→2→3 = 3↑↑↑2 = 3↑↑3 = 3↑27, so these are the same.

2→3→3 = 2↑↑↑3 = 2↑↑2↑↑2 = 2↑↑4 = 2↑2↑2↑2 = 2↑2↑4 = 2↑16

Now let's look at the combinations of 2, 3 and 4. There are 6 of these, and I'll show them here with the largest first:

3→2→4 = 3↑↑↑↑2 = 3↑↑↑3 = 3↑↑(3↑↑3) = 3↑↑(3↑3↑3) = 3↑↑(3↑27), a power-tower of height 3↑{27}.

2→3→4 = 2↑↑↑↑3 = 2↑↑↑(2↑↑↑2) = 2↑↑↑4 = 2↑↑(2↑↑(2↑↑2)) = 2↑↑(2↑↑4) = 2↑↑16, a tower of height 16.

2→4→3 = 2↑↑↑4, the same as 2→3→4 (a tower of height 16).

3→4→2 = 3↑↑4 = 3↑3↑3↑3, a tower of height 4.

4→2→3 = 4↑↑↑2 = 4↑↑4 = 4↑4↑4 = 4↑256.

4→3→2 = 4↑↑3 = 4↑4↑4 = 4↑256.

The thing to notice is that the winners have the 4 at the end, and among them the one with the 3 first is a lot larger. Let's try it again with bigger numbers all around:

4→3→5 = 4↑↑↑↑↑3 = 4↑↑↑↑(4↑↑↑↑4) = 4↑↑↑↑(4↑↑↑4↑↑↑4↑↑↑4)

3→4→5 = 3↑↑↑↑↑4 = 3↑↑↑↑(3↑↑↑↑(3↑↑↑↑3)) = 3↑↑↑↑(3↑↑↑↑(3↑↑↑3↑↑↑3))

Here it appears that 3→4→5 is going to end up being larger than 4→3→5. This is a reflection of the general rule for partial ordering of the hyper operator described above.

## Jonathan Bowers' Extended Operators

Jonathan Bowers has defined a whole series of notations that surpass everything mentioned so far. We can start by defining his extended operators, which are kind of analogous to the hyper operators. In fact, he starts with the hyper operators, which can be thought of as the original, "unextended" set of operators. Instead of putting the operator number inside a raised circle, he uses a pair of "angle-brackets":

a <n> b = a^{ⓝ}b = hy(a,n,b)

So, <1> is addition, a <1> b = a+b; <2> is multiplication and so on.

Using this notation makes it easier to make the operator itself a variable or expression, and unlike using the hy() function it retains the look of applying an operator (because the operator part is in the middle where it "belongs". For example:

a <1+2> b = a <3> b = a^{b}

gn(1) = 3 <6> 3

gn(2) = 3 <3 <6> 3> 3

Mega ≈ 256 <4> 2

Moser ≈ Mega <Mega+1> 2 ≈ (256 <4> 2) <256 <4> 2> 2

Here is his first extended operator:

a <<1>> 2 = a <a> a

a <<1>> 3 = a <a <a> a> a

a <<1>> 4 = a <a <a <a> a> a> a

and so on

a <<1>> b is "a expanded to b".

If you wish, you might prefer to represent this operator in a way similar to the higher hyper operators but with a 1 inside two circles (which I'll enlarge here for clarity):

a^{⓵}b

Since the two circles are hard to see in normal font sizes, I'll
instead use the hyper1 operator symbol `①` inside a set of
parentheses: a^{(①)}b.

Using this notation, the Graham-Gardner number is shown to be between
3^{(①)}65
and 3^{(①)}66 (the gn(x) function is as defined here):

3^{(①)}2 = 3 <3> 3

gn(1) = 3 <6> 3

3^{(①)}3 = 3 <3 <3> 3> 3 = 3 <27> 3

gn(2) = 3 <3 <6> 3> 3

and so on

3^{(①)}65

gn(64) = Graham-Gardner number

3^{(①)}66

Going back to Bowers' notation, here is the definition of the next extended operator. It is a right-associative recursive iteration of the first:

a <<2>> 2 = a <<1>> a

a <<2>> 3 = a <<1>> (a <<1>> a)

a <<2>> 4 = a <<1>> (a <<1>> (a <<1>> a))

and so on

a <<2>> b = a <<1>> (a <<2>> (b-1))

a <<2>> b is called "a multiexpanded to b".

Note the similarity of the definition a <<2>> b = a <<1>> (a
<<2>> (b-1)) to the corresponding definition for a hyper operator,
e.g. a^{②}b = a^{①}(a^{②}(b-1)).

The subsequent extended operators are defined in a similar way, each in terms of the previous.

a <<3>> b is called "a powerexpanded to b".

a <<4>> b is called "a expandotetrated to b".

Then Bowers defines a third series of extended operators, in a similar way:

a <<<1>>> 2 = a <<a>> a

a <<<1>>> 3 = a <<a <<a>> a>> a

a <<<1>>> 4 = a <<a <<a <<a>> a>> a>> a

and so on

a <<<1>>> b is "a exploded to b".

then the next operator:

a <<<2>>> 2 = a <<<1>>> a

a <<<2>>> 3 = a <<<1>>> (a <<<1>>> a)

a <<<2>>> 4 = a <<<1>>> (a <<<1>>> (a <<<1>>> a))

and so on

a <<<2>>> b is called "a multiexploded to b".

and so on:

a <<<3>>> b is called "a powerexploded to b".

a <<<4>>> b is called "a explodotetrated to b".

You can see that this generalizes easily to a function of four numbers, a, b, the number inside the angle-brackets and a number telling how many angle-brackets there are. This can be written as a function, f(a,b,c,d) or something like that — but Bowers wanted to go further.

First page . . . Back to page 3 . . . Forward to page 5 . . . Last page (page 7)

Japanese readers should see: 巨大数論 (from @kyodaisuu on Twitter)

If you like this you might also enjoy my numbers page.

s.11