Ravi is an armchair futurist and an aspiring mad scientist. His mission is to create simplicity out of complexity and order out of chaos.

Saturday, July 2, 2011

Hyperwebster - an uncountable dictionary

Introduction
There are an infinite number of points on the real line. In fact, there are more points on any segment of the real line, no matter how small, than all of natural numbers combined. This was proved by Cantor using the diagonal argument.

To get an idea of the number of points on a real line, Dr. Ian Stewart gave the following construction of an infinite dictionary, the "hyperwebster".

Construction of the Hyperwebster

An enterprising publishing company decides to print a book containing all the words that can possibly created from the English alphabet A-Z. Since it is really a collection of letters of the alphabet in any order, of any length, we will see:
• some non-sensical words like "XWPBQI" and "NKPMZ",
• some real words like "SUN" and "MOON" and
• even some composite words like "SUNMOON" and "MOONSUN".
Since not all such words have meaning, the publishing company decides to include only the words in the book, without their associated meaning, if any.

So the "Hyperwebster" book looks like this:
A, AA, AAA, ..., AB, ABA, ABAA, ..., AC, ..., AZ, AZA, ...

B, BA, BAA, ..., BB, BBA, BBAA, ..., BC, ..., BZ, BZA, ...

C, CA, CAA, ..., CB, CBA, CBAA, ..., CC, ..., CZ, CZA, ...

Z, ZA, ZAA, ..., ZB, ZBA, ZBAA, ..., ZC, ..., ZZ, ZZA, ...

The staff at the publishing company realizes that it can partition the words into 26 volumes, one for each letter of the alphabet. So the Hyperwebster now looks like the following:

Volume A: A, AA, AAA, ..., AB, ABA, ABAA, ..., AC, ..., AZ, AZA, ...
Volume B: B, BA, BAA, ..., BB, BBA, BBAA, ..., BC, ..., BZ, BZA, ...

Volume C: C, CA, CAA, ..., CB, CBA, CBAA, ..., CC, ..., CZ, CZA, ...

Volume Z: Z, ZA, ZAA, ..., ZB, ZBA, ZBAA, ..., ZC, ..., ZZ, ZZA, ...

Next, the staff realizes that all words in volume A start with the letter A, all words in volume B start with the letter B and so on. This means that the first letter in each word can be inferred from its volume and hence, the first letter can be dropped. Excellent! The publishing company just saved some ink by not printing an infinite number of letters.

The new volumes now look like this:

Volume A: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...
Volume B: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...
Volume C: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...
Volume Z: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...

The staff realizes that each volume now looks identical, except for the name of the volume. Why would anyone buy 26 identical copies of the same content? So, the decision is made to publish a single volume called "Hyperwebster", which looks like the following:

New hyperwebster:
A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...

This turns out to be identical to the original hyperwebster that they started out with.

Original hyperwebster:
A, AA, AAA, ..., AB, ABA, ABAA, ..., AC, ..., AZ, AZA, ...

B, BA, BAA, ..., BB, BBA, BBAA, ..., BC, ..., BZ, BZA, ...

C, CA, CAA, ..., CB, CBA, CBAA, ..., CC, ..., CZ, CZA, ...

Z, ZA, ZAA, ..., ZB, ZBA, ZBAA, ..., ZC, ..., ZZ, ZZA, ...

The staff realizes that:

1. the original volume can be partitioned into 26 different volumes,
2. the first letter in each volume can be dropped, making each volume identical,
3. and each volume now is really identical to the original volume
4. and steps 1-3 can be applied ad infinitum.
The publishing company wisely abandons publishing the hyperwebster, even though each execution of steps 1-4 represent an infinite amount of savings!

Moral of the story
The content of the hyperwebster is equivalent to points on a real line (replace A-Z above with 0-9 and observe that it generates all real numbers). Any subset of the real line can be chopped up into infinitely many parts, each of which has the same number of points as the original. Each of the parts in turn can be chopped up into infinitely many subparts, each having the same number of points as the original, ad infinitum. Yeah, that's a lot of points! Continuum hypothesis states that the number of such points is aleph-1.

References
• Leonard M. Wapner, "The Pea and the Sun", 2005.

1. Brilliant concept! The Hyperwebster example is very apt and convincing. Are there any other real life examples where I can apply this?

1. No not yet though since it is possible mathematically only. But some scientists believe that it could be possible physically too.

2. Very cute! But if each word is finite, the set of words is actually only countably infinite. A countable union of finite sets is countable and for each finite length n, there are only finitely many words of length n.

1. Hi Alex, your statement is true! The hyperwebster argument only works when words are allowed to be infinitely long (e.g. AAA...) . And fortunately, there are uncountably many of them. :-)

3. why couldn't it be called the hyperoxford?

4. why couldn't it be called the hyperoxford?

1. Perhaps the Webster has more words in its dictionary that the Oxford! :)

2. I found this out on a Vsauce episode called: Bacach-Tarski paradox

3. *Banach Tarski

4. it should be called the hyperduden

5. Finally something about it !

this is the only page on google I found for the Hyper Webster !

thank you

I learned that it exists via Vsauce :

again, thaaaaaaaaaaanks a lot <3

6. An application to cryptography: the hyperwebster is equivalent to the cardinal of possible Plaintexts (in any codification like A..Z text, ASCII, binary, etc). So |P|=c if continuum-hypothesis rules.

7. Imagine a super computer printing this dictionary. It would
never print anything else than "a" words till the end of time.

1. Hi Hans,

To print "anything other than 'a' words", you can use a different enumeration method. You could enumerate by word length:

- All length=1 words ("A", "B", ..., "Z"),
- then all length=2 words ("AA", "AB", .., "AZ", "BA", "BB", ..., "BZ", "CA", ..., "ZZ") and so on.

8. The rule for filling in words is not clear. Also, it must be shown that the list of words is exhaustive.

1. Hi Thomas,

Re: enumerating words, you could enumerate by length. All length=1 words ("A", "B", ..., "Z"), then all length=2 words ("AA", "AB", .., "AZ", "BA", "BB", ..., "BZ", "CA", ..., "ZZ") and so on.

Re: exhaustive list, any word you choose will appear in this list. Following above listing method, if the length of your chosen word is n, it will appear within (26^0 + 26^1 + ... + 26^n) steps.

Cheers,
Ravi

9. I'm confused - If there is a finite number of letters in the English alphabet, how can there be an infinite number of possible combinations in the HyperWebster?

1. Because words in the Hyperwebster can be arbitrarily (i.e. "infinitely") long! (And every word of any given length is present in this dictionary.)

2. Would it not be simple to implement an interface that allows the user to choose the maximum word length that Hyper Webster will generate (of course a limit being added)?

3. And if you do not mind explaining (if you know yourself), do you know the algorithm for how they sequenced the characters in a way to find each of the possibilities? I'm trying myself to make something based on this system, but I can't seem to find an accurate function.

10. Pramerios
An algorithm is not needed. You must try to get your head around what is basically a simple concept- First word is a, last letter is z,,, ad infitum. The point of this hypothetical dictionary is to demonstrate infinity so your proposal to limit the number of letters defeats the objective. An error in the original text is the claim that "The publishing company just saved some ink by not printing an infinite number of letters" This is obviously wrong since infinity minus any number is still infinity.
What is really bizarre about this dictionary is that one word would, by shear chance, be the entire contents of War and Peace [with no spaces]and another nearby would be the same with a typographical error in the last word.It would be hard to pinpoint the specific word though, as you would be required to read the whole word to get the story and would not know whether the story you read deviated from the original unless you proof-read it alongside the original. It would also include a detailed biography of every person or indeed carbon atom that ever existed or any thing else you can imagine since there is an inexhaustable combination of letters.

11. This reminds me of Borges story "The Library of Babel". A library of books containing all the different combinations of letters, spaces and punctuations. https://en.wikipedia.org/wiki/The_Library_of_Babel

12. So can we call the Babel library a hyperwebster?

13. The crazy part is that the hyperwebster is so aggressively self-similar, that it can be reduced an infinite number of times, and still contain every entry.

You could delete the first two letters of every entry starting with "AA", and you'll find every entry in the hyperwebster before reaching "AB".

But then you could delete the first hundred letters of entries starting with one hundred A's, and find every entry in the hyperwebster before reaching AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB.

You could continue in this matter, removing any number of superfluous lines, without losing one bit of information.

Speaking of information, I wonder if there would be a way to calculate the signal to noise ratio within a hyperwebster. Handwaving the task of delineating information from noise, it seems that there would be an infinite number of meaningful entries, and an infinite number of useless entries.

Unless there is some trick of math to prove otherwise, there are as many single hyperwebster entries that contain every bit of text ever penned, arranged alphabetically in order of author's favorite spice, forward and then in reverse as there are entries that contain an arbitrarily large number of As.

1. So my guess is 1:1.