About Me

My photo
Ravi is an armchair futurist and an aspiring mad scientist. His mission is to create simplicity out of complexity and order out of chaos.

Sunday, August 14, 2011

Evaluating interviewers - Part 2

In this post, I show a method to mathematically evaluate an interviewer based on the job performance of the candidate that gets hired. This is a continuation of (but independent of) Evaluating Interviewers - Part 1, where I showed a method to evaluate an interviewer against other interviewers. I am replicating the definitions here from Part 1.

Definitions
SymbolDefinition
$C_i$ $i^{th}$ candidate
$R_j$ $j^{th}$ interviewer
$s_{ij}$ score for the $i^{th}$ candidate by the $j^{th}$ interviewer (this is the grade, usually between 1 and 5, given by the interviewer to the candidate based on the interview)
$m_i$ number of interviewers in the interview panel for candidate $i$ (the number of interviewers, usually between 4 and 8, that the candidate faces during the course of the interview process)
$n_j$ number of candidates interviewed by interviewer $j$ (can be large, in tens or hundreds, especially for popular interviewers)
$\hat{n_j}$ number of candidates interviewed by interviewer $j$ that joined the company/group
$p_i$ job performance of $i^{th}$ candidate after joining the company/group (usually between 1 and 5, captured in a company-internal HRM system)
$s_i$ average score given by the interview panel for the $i^{th}$ candidate, $s_i=\sum_{j}s_{ij}/{m_i}$ (usually between 1 and 5)

What we expect from interview scores
We take the interviewer's score $s_{ij}$ as a prediction about the candidate $C_i$'s job performance once hired. The higher the score, the better the predicted job performance. E.g., when an interviewer gives a score of $3.1$ to candidate $C_1$ and $3.2$ to $C_2$, in effect, he is vouching for candidate $C_2$ to out-perform candidate $C_1$, by a margin proportional to $0.1$.

Secondly, we expect job performance to be directly and linearly proportional to the score. E.g., if scores of $3.1$ and $3.2$ translate to job performance ratings of $3.1$ and $3.2$ respectively, then a score of $3.3$ should translate to a job performance rating of $3.3$ or thereabouts.

In other words, we expect the following from our scores:
  1. Ordinality: if $s_{aj}>s_{bj}$, then we hold interviewer $R_j$ to a prediction that candidate $C_a$ would outperform $C_b$ on the job.
  2. Linearity: job performance should be directly and linearly proportional to the score.
So we expect a plot of job performance (Y-axis) against interview score (X-axis) to be roughly linear for each interviewer, ideally along the $y=x$ line. We will discuss variations from this line and its implications later in the article.

Good interviewer v/s Bad interviewer

We classify an interviewer as good when there is high correlation between the score given by the interviewer to the candidate and the job performance of the candidate post-hire. The higher the correlation, i.e. the lower the variance, the better the interviewer. This is because a lower variance implies better predictability on part of the interviewer. Conversely, the higher the variance, the worse the interviewer.

Here is a graph of job performance (Y-axis) against interviewer score (X-axis) for a good interviewer:


Here is the graph for a bad interviewer. Notice the high variance, implying a low correlation between interview score and job performance:

Easy v/s Hard interviewers
Variation from $y=x$ line doesn't necessarily indicate a bad interviewer. For an interviewer to be bad, the correlation between interview score and job performance should be low.

Here is an example of a good interviewer with high correlation between interview score and job performance, but whose mean is different from $y=x$ line.
Note that the above graph satisfies both the ordinality and linearity conditions and hence the interviewer is a good interviewer. The above graph is for an "easy" interviewer - one who tends to give a higher score than those of his peers. Notice that the mean line hangs below the $y=x$ line.

Here is another example of an interviewer with high correlation between interview score and job performance, but whose mean is different from $y=x$ line.
This is a "hard" interviewer - one who tends to give a lower score than those of his peers. Notice that the mean line hangs above the $y=x$ line.

As opposed to the good interviewers, here are graphs for bad interviewers.

In the above case, the interviewer is an easy interviewer - one who tends to give a higher scores than his peers, as seen from the mean line (thicker one parallel to $y=x$ line). However, the low correlation suggests that the interviewer's score does not accurately portray job performance.

Here is another bad interviewer - this time a hard one - one who tends to give lower scores than his peers.

The above graphs show that both easy and hard interviewers can be good interviewers. And on the flip side, both easy and hard interviewers can be bad interviewers. What really distinguishes good from bad is how "tightly" the points hug the mean line in the graph. With this as the background, here is some math that will order interviewers in the descending order of "goodness".

The Math
  1. Find the line parallel to $y=x$ that serves as the mean for all points in the graph. There can be different definitions for "mean" here - e.g. one that is a mean of all $x$ and $y$ co-ordinates of the points, one that minimizes the sum of distances to each point, etc. For simplicity, we choose the mean of all $x$ and $y$ coordinates for that interviewer, i.e. $\overline{x}_j$ and $\overline{y}_j$ for interviewer $R_j$ respectively.
\[\overline{x}_j=\frac{\sum_{k}s_{kj}}{\hat{n_j}}\]
\[\overline{y}_j}=\frac{\sum_{k}p_k}{\hat{n_j}}\]
So the dark line in the graph corresponds to $y=f_j(x)=x+(\overline{y}_j-\overline{x}_j)$.
  1. We compute the standard deviation of interviewer $R_j$'s score, $\sigma_j$, as follows.
\[\sigma_j=\sqrt{\frac{\sum_k{(p_{i_k}-f_j(s_{i_kj}))^2}}{\hat{n_j}-1}}\]

where subscript $i_k$ is used to indicate a candidate that the interviewer interviewed and was eventually hired. So, essentially, we are determining the variance of the points with respect to the line $y=f_j(x)$. The lower the $\sigma_j$, the better the interviewer is at predicting the job performance of the candidate.
  1. Alternatively, instead of the above steps, we can compute the correlation coefficient between the interview scores and the job performance score.
  2. Order interviewers $R_j$ based on descending order of $\sigma_j$ (or the correlation coefficient). This is the list of interviewers - from the best to the worst - in that order!
In Closing
  • We outlined one approach to rank interviewers according to their ability to predict future performance of a job candidate.
  • There are many ways in which the "goodness" of an interviewer can be defined. Each can alter our algorithm.
  • There are many ways in which one can define average performance of the interviewer (the dark solid line in the graph). We choose a simple definition.
  • Regardless of the customization applied to our algorithm, the graphs and the rankings can help the organization better the interview process, thus:
  1. if an interviewer is deemed "bad", retrain them
  2. if an interviewer is deemed "easy", perhaps discount their score for the candidate by their variance, $\sigma_j$ to determine what a regular interviewer's score would have been for that candidate.
  3. similarly, for a "hard" interviewer, add their variance $\sigma_j$ to normalize their score and bring it up to par with other "regular" interviewers. 

Monday, August 1, 2011

Evaluating interviewers - Part 1

This post mathematically answers the question - "how to determine how good of an interviewer someone is".

We evaluate interviewers in 2 complementary ways:
  1. With respect to other interviewers (covered in this blog post)
  2. With respect to interview candidate's actual job performance after being hired (covered in the next blog post).
Each method has its strengths. For example, we usually have a lot of data for method 1 (since there are more candidates interviewed than hired), making it easier to evaluate an interviewer relative to other interviewers. However, the true test of any interviewer is the consistency with which they can predict a candidate's job performance should they be hired. This data may be hard to come by (or integrate, with HRM systems). But each method can be used independently, or collectively.

Definitions
(Note: subscript $i$ is used for candidates and $j$ for interviewers)

SymbolDefinition
$C_i$ $i^{th}$ candidate
$R_j$ $j^{th}$ interviewer
$s_{ij}$ score for the $i^{th}$ candidate by the $j^{th}$ interviewer (this is the grade, usually between 1 and 5, given by the interviewer to the candidate based on the interview)
$m_i$ number of interviewers in the interview panel for candidate $i$ (the number of interviewers, usually between 4 and 8, that the candidate faces during the course of the interview process)
$n_j$ number of candidates interviewed by interviewer $j$ (can be large, in tens or hundreds, especially for popular interviewers)
$\hat{n_j}$ number of candidates interviewed by interviewer $j$ that joined the company/group
$p_i$ job performance of $i^{th}$ candidate after joining the company/group (usually between 1 and 5, captured in a company-internal HRM system)
$s_i$ average score given by the interview panel for the $i^{th}$ candidate, $s_i=\sum_{j}s_{ij}/{m_i}$ (usually between 1 and 5)

Evaluating an interviewer w.r.t. other interviewers
Consider the random variable $X_j$ defined below, which is the difference between the score given by the $j^{th}$ interviewer and the average score given by the interview panel, for any candidate interviewed by $R_j$:
\[X_j=\{s_{ij}-s_i | R_j ~ \text{interviewed} ~ C_i\}\] We need to answer the following questions:
  • What does a probability distribution of $X_j$ look like?
  • What does the probability distribution of $X_j$ tell us about the interviewer $R_j$?
Before answering the above question, consider the following random variable, $X$: \[X=\sum_j{X_j}\]
This is the random variable for the difference between the score given by an interviewer and the mean of the score received by the candidate. Clearly, expectation of $X$, $E(X)=0$. Moreover, we expect $X$ to have a normal distribution. So we expect some $X_j$ to be centered to the left of 0, some to the right of 0, but most others around 0.

So the answers to the above questions are:
  • We expect the probability distribution of $X_j$ to be normal, centered around 0, on an average.
  • $E(X_j)$ tells us about the type of interviewer $R_j$ is:
    • $E(X_j)=0$ or more accurately, $|E(X_j)| < a\sigma$ (where $\sigma$ is the standard deviation of $X$, and $a>0$ is appropriately chosen) implies that the interviewer $R_j$ is a normal interviewer.
    • $E(X_j)\geq a\sigma$ implies that interviewer $R_j$ generally gives higher scores than average.
    • $E(X_j)\leq -a\sigma$ implies that interviewer $R_j$ generally gives lower scores than average.

Categorization of interviewers
From the above answers regarding $X_j$, we can categorize interviewer $R_j$ into a few distinct types:

The "regular" interviewer

Most interviewers' distribution of $X_j$ would look like the above. (Perhaps not as sharp a drop off as shown above around 0 - I just couldn't draw a good graph!) This indicates that most interviewers' scores would be close to the average score given by the interview panel.

The "easy" interviewer

The easy interviewer tends to give, on an average, higher scores than the rest of the interview panel. This causes the bulk of the graph to shift beyond 0. The farther this hump from 0, the easier the interviewer. If you are an interviewee, this is the kind of interviewer you want!

The "hard-ass" interviewer

The hard interviewer tends to give lower scores than the rest of the interview panel. This causes hump of the graph to go below zero. The farther below zero, the harder the interviewer. If you are an interviewee, this is the kind of interviewer you want to avoid!


The "activist/extremist" interviewer

Some interviewers tend to artificially influence the outcome of the interview process by giving extreme scores - like "definitely hire" or "definitely no hire". The graph would not be as sharp as the one depicted above, but the idea is that there would be an above-average frequency of extreme values.


The "clueless" interviewer

Some interviewers cannot correctly judge the potential of a candidate. Their scores would be all over the range, instead of a well-defined hump.

In closing
We presented a mathematical basis to compare interviewers. This yields a categorization of interviewers. In the next blog post, Evaluating Interviewers - Part II, we analyze how to evaluate an interviewer with respect to a hired candidate's job performance.

Sunday, July 10, 2011

Lebesgue Measure

Lebesgue measure is a measure of the length of a set of points, also called the $n$-dimensional "volume" of the set. So, intuitively:
  • 1-dimensional volume is the length of a line segment,
  • 2-dimensional volume is the area of a surface,
  • 3-dimensional volume is the volume that we know of
  • and so on.
These measures are defined over the space of real numbers $\mathbb{R}$, as opposed to, say, the space of natural numbers $\mathbb{N}$.

The basics
  • Lebesgue measure of a set of real numbers between $[a, b]$ is $b-a$. This is consistent with our intuitive understanding of the length of a line segment.
  • Lebesgue measures are finitely additive. So the measure of a union of the set of points $[a, b]$ and $[c, d]$ is $(b-a) + (d-c)$. I.e. $\text{measure}([a,b]\bigcup[c,d])=(b-a)+(d-c)$. This is again consistent with our intuitive understanding of the sum of two line segments.
  • In fact \[\text{measure}(\bigcup_{i=1}^{\infty} A_i)=\sum_{i=1}^{\infty}\text{measure}(A_i)\] The above applies to any $n$-dimensional measure (length, area, volume, etc.) This is consistent with our intuitive understanding of the sum of multiple line segments, areas, volumes, etc.
Those were the basics. So far so good. We now look at Lebesgue measure for:
  • countable sets (set of natural numbers, for example),
  • uncountable sets (set of real numbers) and
  • some seeming bizarre sets for which a Lebesgue measure cannot be defined.
Lebesgue measure for countable sets
  • Lebesgue measure of integer points on the real line is $0$. This is because points are just locations on a real line and intuitively, integers are few and far between on a real line.
  • Lebesgue measure of a countable set is $0$. This is a stronger statement than the above. It is interesting, since it implies that a set containing countably infinite number of points is also $0$. This is not surprising though, since between any two distinct integer points on the real line lie uncountably many real points. So intuitively, a countable set of points is just not large enough to have a non-zero Lebesgue measure!
  • One implication of the above is that rational numbers have a measure of $0$. This is in spite of the fact that rationals are dense - i.e. between any 2 rational numbers, there are countably infinite number of rationals! So, countable infinity is no match for Lebesgue measure.
Lebesgue measure for uncountable sets
  • As hinted earlier, Lebesgue measure for an uncountable number of contiguous points is non-zero. So for the set of points $[a, b]$, this is $b-a$.
  • Removing a countable number of points from an uncountable set does not change its measure. E.g the measure of the half open set $(a, b]$ (which excludes just the point $a$) or the full open set $(a, b)$, which excludes both points $a$ and $b$ is $b-a$.
  • One implication of the above is that if we were to remove, say, all rational numbers from the set $[a, b]$, its measure is still $b-a$. So we removed all countably infinite number of rationals, and yet the measure doesn't budge. Cool!
  • Lebesgue measure of an uncountable set need not always be non-zero. Georg Cantor gave the construction of an uncountable set, called the Cantor set, which has uncountably infinite points, and yet has a measure of $0$, as shown in the next section.
Cantor Set - an uncountable set with 0 measure
  1. Start with the set of points $\mathbb{R}[0, 1]$.
  2. Divide it into 3 parts - $[0, 1/3]$, $(1/3, 2/3)$, $[2/3, 1]$ and remove the middle part $(1/3, 2/3)$ from the set, as depicted in the diagram below. (Note: the first line in the diagram represents $\mathbb{R}[0,1]$ and those below it represent the construction of the Cantor set as detailed in this and subsequent steps.)
  1. Of the remaining parts, divide each into 3 parts, as shown in the diagram above. E.g.
    1. Divide $[0, 1/3]$ into $[0, 1/9]$, $(1/9, 2/9)$, $[2/9, 1/3]$ and again remove the middle part $(1/9, 2/9)$ from the set.
    2. Similarly, divide $[2/3, 1]$ into $[2/3, 7/9]$, $(7/9, 8/9)$ and $[8/9, 1]$ and remove the middle part, $(7/9, 8/9)$ from the set.
  2. Keep repeating the above step, as shown in the diagram above.
  • The segments that we end up removing have a combined measure of \[\frac{1}{3}+\frac{2}{9}+\frac{4}{27}+\frac{8}{81}\cdots=\frac{1}{3}\sum_{i=0}^{\infty}\bigg(\frac{2}{3}\bigg)^i=1\]
  • This means that the number of points left has a Lebesgue measure of 0, even though the number of points left is uncountable!
  • This is bizarre indeed. However, there is something even more bizarre. There are some sets that are not Lebesgue measurable! I.e. we cannot define a measure for them. A Vitali set is one such non-measurable set.
Vitali Set - a non-measurable set
Construction of this set relies on the Axiom of Choice and is slightly involved.
  1. Definition: We define 2 points, $a$ and $b$, as equivalent if $a-b$ is a rational number. We denote this as $a\sim b$. Note that $a$ and $b$ themselves, don't have to be rationals for them to be equivalent. Examples of equivalence are:
    1. $\frac{1}{2}\sim \frac{1}{3}$, since $\frac{1}{2}-\frac{1}{3}=\frac{1}{6}$, a rational number.
    2. $\pi\sim\pi+\frac{1}{2}$, since $\pi-(\pi+\frac{1}{2})=-\frac{1}{2}$, a rational number.
  2. We partition the set of real numbers $\mathbb{R}[0, 1]$ into equivalence classes using the above definition. All points within an equivalence class are equivalent to each other, of course.
  1. Each equivalence class has countably many points, one corresponding to each rational number.
  2. There are uncountably many such equivalence classes, one corresponding to each irrational number (that is not separated from another irrational number by a rational distance).
  3. By axiom of choice, we can construct a set containing one point from each of these equivalence classes, which we call $V$, the Vitali set.
  4. Note that since elements of $V$ are drawn from $\mathbb{R}[0,1]$, $\text{measure}(V)\leq 1$.
  5. Definition: We define a translation of $V=\{v_0,v_1,v_2,\cdots\}$ called $V_q=\{v_0+q,v_1+q,v_2+q,\cdots\}$ where $q\in\mathbb{Q}[0,1]$, the set of rationals between $[0,1]$. So for each rational number $q$, we have a corresponding uncountable set $V_q$ as shown in the diagram above.
  6. Note that $\text{measure}(V)=\text{measure}(V_q)\quad\forall q\in\mathbb{Q}[0,1]$, since each $V_q$ is merely a translation of $V$.
  7. Also note that \[\bigcup_{q\in\mathbb{Q}[0,1]}V_q=\mathbb{R}[0,2]\] i.e. we just partitioned $\mathbb{R}[0,2]$ into countably many partitions, $V_q, q\in\mathbb{Q}[0,1]$, with each partition containing an uncountable number of points.
  8. With points (8) and (9) in mind, what is the Lebesgue measure of $V$?
    1. If it is $0$, from (8), it means that each of $V_q$ has a measure of $0$. It implies, from (9), that $\text{measure}(\mathbb{R}[0,2])=0$ which is false, since $\text{measure}(\mathbb{R}[0,2])=2$.
    2. If it is greater than $0$, say $a$, then from (8), each of $V_q$ has a measure of $a$. Hence, from (9), since there are infinitely many $q$s, $\text{measure}(\mathbb{R}[0,2])=\infty$ which is false, since $\text{measure}(\mathbb{R}[0,2])=2$.
  9. Hence, $V$ is not Lebesgue measurable!
    In closing
    We saw that there are some sets with Lebesgue measure $0$ and some with non-zero. We also saw that there are some uncountable sets with measure $0$ and some sets with an undefined measure.

    References

    Saturday, July 2, 2011

    Hyperwebster - an uncountable dictionary


    Introduction
    There are an infinite number of points on the real line. In fact, there are more points on any segment of the real line, no matter how small, than all of natural numbers combined. This was proved by Cantor using the diagonal argument.

    To get an idea of the number of points on a real line, Dr. Ian Stewart gave the following construction of an infinite dictionary, the "hyperwebster".

    Construction of the Hyperwebster


    An enterprising publishing company decides to print a book containing all the words that can possibly created from the English alphabet A-Z. Since it is really a collection of letters of the alphabet in any order, of any length, we will see:
    • some non-sensical words like "XWPBQI" and "NKPMZ",
    • some real words like "SUN" and "MOON" and
    • even some composite words like "SUNMOON" and "MOONSUN".
    Since not all such words have meaning, the publishing company decides to include only the words in the book, without their associated meaning, if any. 

    So the "Hyperwebster" book looks like this:
    A, AA, AAA, ..., AB, ABA, ABAA, ..., AC, ..., AZ, AZA, ...

    B, BA, BAA, ..., BB, BBA, BBAA, ..., BC, ..., BZ, BZA, ...

    C, CA, CAA, ..., CB, CBA, CBAA, ..., CC, ..., CZ, CZA, ...

    Z, ZA, ZAA, ..., ZB, ZBA, ZBAA, ..., ZC, ..., ZZ, ZZA, ...

    The staff at the publishing company realizes that it can partition the words into 26 volumes, one for each letter of the alphabet. So the Hyperwebster now looks like the following:

    Volume A: A, AA, AAA, ..., AB, ABA, ABAA, ..., AC, ..., AZ, AZA, ...
    Volume B: B, BA, BAA, ..., BB, BBA, BBAA, ..., BC, ..., BZ, BZA, ...

    Volume C: C, CA, CAA, ..., CB, CBA, CBAA, ..., CC, ..., CZ, CZA, ...

    Volume Z: Z, ZA, ZAA, ..., ZB, ZBA, ZBAA, ..., ZC, ..., ZZ, ZZA, ...

    Next, the staff realizes that all words in volume A start with the letter A, all words in volume B start with the letter B and so on. This means that the first letter in each word can be inferred from its volume and hence, the first letter can be dropped. Excellent! The publishing company just saved some ink by not printing an infinite number of letters.

    The new volumes now look like this:

    Volume A: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...
    Volume B: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...
    Volume C: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...
    Volume Z: A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...

    The staff realizes that each volume now looks identical, except for the name of the volume. Why would anyone buy 26 identical copies of the same content? So, the decision is made to publish a single volume called "Hyperwebster", which looks like the following:

    New hyperwebster:
    A, AA, AAA, ..., B, BA, BAA, ..., C, ..., Z, ZA, ...

    This turns out to be identical to the original hyperwebster that they started out with.

    Original hyperwebster:
    A, AA, AAA, ..., AB, ABA, ABAA, ..., AC, ..., AZ, AZA, ...

    B, BA, BAA, ..., BB, BBA, BBAA, ..., BC, ..., BZ, BZA, ...

    C, CA, CAA, ..., CB, CBA, CBAA, ..., CC, ..., CZ, CZA, ...

    Z, ZA, ZAA, ..., ZB, ZBA, ZBAA, ..., ZC, ..., ZZ, ZZA, ...

    The staff realizes that:

    1. the original volume can be partitioned into 26 different volumes,
    2. the first letter in each volume can be dropped, making each volume identical,
    3. and each volume now is really identical to the original volume
    4. and steps 1-3 can be applied ad infinitum.
    The publishing company wisely abandons publishing the hyperwebster, even though each execution of steps 1-4 represent an infinite amount of savings!

    Moral of the story
    The content of the hyperwebster is equivalent to points on a real line (replace A-Z above with 0-9 and observe that it generates all real numbers). Any subset of the real line can be chopped up into infinitely many parts, each of which has the same number of points as the original. Each of the parts in turn can be chopped up into infinitely many subparts, each having the same number of points as the original, ad infinitum. Yeah, that's a lot of points! Continuum hypothesis states that the number of such points is aleph-1.

    References
    • Leonard M. Wapner, "The Pea and the Sun", 2005.

    Wednesday, June 29, 2011

    Continuum Hypothesis

    This post looks at Cantor's continuum hypothesis and the relevant historical results around it.

    Georg Cantor
    Georg Cantor defined infinity in two steps. First he defined an infinite set and then "infinity":

    • an infinite set is one that can be put in one-to-one correspondence with a proper subset of itself. E.g. natural numbers can be put in a one-to-one correspondence with the set of even numbers, which is a proper subset of the natural numbers. Here is the correspondence {0, 1, 2, 3, ... } maps to {0, 2, 4, 6, ...}.
    • The cardinality of such a set is "infinity".
    Cantor went a step ahead and defined a family of "infinities", each larger than the previous one.
    • Cantor proved that even an infinite set cannot be put in one-to-one correspondence with its power set (i.e. the set of all its subsets), whose cardinality is 2^n (if the cardinality of the original set is n).
    • This defines a family of "transfinite" cardinalities, each larger than the one before:
      • aleph-0 (the cardinality of natural numbers),
      • aleph-1 = 2^aleph-0,
      • aleph-2 = 2^aleph-1
      • and so on.
    • Real numbers cannot be put in one-to-one correspondence with natural numbers. (See Cantor's diagonal argument). The cardinality of real numbers is called the cardinality of the continuum, c.
    • Cantor hypothesized that c = aleph-1, which became known as the "Continuum Hypothesis". In other words, there is no transfinite cardinality between aleph-0 (cardinality of natural numbers) and c (cardinality of real numbers). But Cantor could not prove it. It turns out that there is a very good reason for that!
    • By the way, Cantor called natural numbers or any subset thereof as countable, since they can be counted, i.e. put in one-to-one correspondence with 0, 1, 2, 3, ... . He called real numbers and higher transfinite cardinalities as uncountable, because they cannot be put in one-to-one correspondence with 0, 1, 2, 3, ... . (See Cantor's diagonal argument).

    Kurt Godel
    Kurt Godel came along and gave the world two "incompleteness" theorems.
    • Informally, the first incompleteness theorem says that in any axiomatic system involving natural numbers, there are statements that cannot be proved or disproved within that system. In other words, there are some "undecidable" statements within the system.
    • Stated differently, we can never come up with a finite set of axioms that can prove or disprove every statement in that system. Hence the system is always "incomplete".
    • The intuition behind this is that there are only countably many provable statements, but uncountably many statements in the system. By the "pigeon hole" principle, some statements are unprovable. In fact, a vast majority of the statements are unprovable, since "uncountable" is much, much larger than "countable"!
    • The second incompleteness theorem says that for some axiomatic systems, consistency cannot be proved within the system itself.
    Axiom of Choice
    • Simply stated, the axiom of choice says that given a bunch of non-empty sets, it is always possible to choose one element from each set to construct a new set.
    • It seems logical and harmless. But complications arise when the original set has a large cardinality, e.g. aleph-1. How do we go about choosing one element from uncountably many sets? Where do we start - it cannot be put in one-to-one correspondence with natural numbers and hence cannot be labeled 1, 2, 3, ..., etc. So we wouldn't know which set is the first one, which is second and so on.
    • One way to go about this complication is to just assume that there exists such a choice set and circumvent the above complexity.
    • Not everyone agrees! So there are two different axiomatic set theories - one without the axiom of choice (ZF for Zermelo-Frankl, the formulators of set theory) and another with the axiom of choice (ZFC).
    • Godel proved that the Axiom of Choice was consistent with ZF. In other words, ZFC is consistent.

    The Finale
    • Godel also proved that the continuum hypothesis was consistent with axiomatic set theory. In other words, it cannot be disproved within ZF.
    • Paul Cohen comes along and proves that continuum hypothesis is independent of the other axioms in ZF. In other words, neither the continuum hypothesis nor its opposite can be proved within ZF. No wonder Cantor couldn't prove the "Continuum Hypothesis" from ZF axioms!
    • Well, what does it all mean? Continuum hypothesis must either be true or be false, since aleph-1 must be either equal to c or not equal to c. The bottom line is that the result by Cohen and Godel say that neither the equality nor the inequality can be proved within the axiomatic system, no matter how smart you are or how hard you try!
    • I, for one, believe that aleph-1 is indeed equal to c. In other words, there are no cardinalities in between that of the natural numbers and that of the real numbers.

    References
    • Leonard M. Wapner, "The Pea and the Sun", 2005.

    Friday, May 27, 2011

    Experiments in graph 3-coloring - block-cutpoint graph

    Introduction
    One way to reduce the runtime complexity of an algorithm is to partition the problem into smaller sub-problems, solve the sub-problems and combine these solutions to solve the original, bigger problem. This approach can be used when coloring a graph.

    I investigated an approach that uses a block-cutpoint representation of the given graph. On a high level, here is the solution:

    1. Given an undirected graph, construct its block-cutpoint graph.
    2. Solve the 3-coloring problem for each of the blocks.
    3. Combine the solution for each of the blocks to 3-color the original graph.

    Definitions
    Bi-connected graph
    An undirected graph is bi-connected if any two vertices lie on a cycle. In other words, there are at least 2 distinct, disjoint paths between any two vertices. This concept is similar to that of a strongly connected component in a directed graph.

    Block (aka Bi-connected component)
    A bi-connected subgraph of a graph is called a block.

    Cutpoint (aka cut vertex)
    A vertex whose removal disconnects a graph is called a cut point or a cut vertex. Such vertices are "on the boundary" of a block. In Figure 1, they are shown in red color.

    Block-Cutpoint graph

    • A graph where a block is collapsed into a vertex,
    • a cut vertex is represented as a vertex in this graph,
    • there is an edge between a block vertex and a cut vertex only if the cut vertex is part of the block,
    • there is an edge between two cut vertices only if there are adjacent in the original graph.



    Interesting point to note is that any undirected graph can be represented as a tree of blocks and cutpoints, as shown in the figure above.


    Coloring algorithm

    1. Partition given graph G into blocks B1, B2, ..., Bm.
    2. Color each of the blocks independently.
    3. The reconciliation step: two blocks, say B1 and B2, can share only 1 vertex (the cut vertex), say X.
      1. If X is colored with the same color, then we simply merge the colorings for B1 and B2.
      2. If X is colored differently in B1 and B2, say c1 and c2, respectively, then:
        1. we make X's color in B2 as c1 (its color in B1)
        2. All vertices in B2 colored as c2 are now colored as c1 and
        3. all vertices in B2 colored as c1 are now colored as c2.
      3. So in essence, we swapped colors c1 and c2 in B2.
    4. If we perform the reconciliation step in breadth first search order (BFS) of the block-cutpoint tree, we have a valid coloring for the original graph, provided each of the blocks has a valid coloring.


    Runtime complexity

    • Let the size of G be n.
    • Let the sizes of the blocks B1, B2, ..., Bm be n1, n2, ..., nm, respectively.

    Since the worst case running time is exponential in the number of nodes, say a^n for G, the runtime complexity of the above algorithm is a^n1 + a^n2 + ... + a^nm + O(n). This translates to O(a^max(n1, n2, ..., nm)) asymptotically.

    In closing
    This algorithm can be applied to n-coloring, not just 3-coloring. It reduces the time complexity if there are cut vertices in the graph. The more even the size of the blocks, the more the speed-up in run time.

    Wednesday, May 18, 2011

    Experiments in graph 3-coloring

    3-coloring using independent sets
    • By definition, vertices in an independent set (IS) share no edges. So these vertices can all be colored with one color.
    • If after the removal of these vertices (and incident edges), the graph is 2-colorable, then we have a valid 3-coloring.
    Fig 1: An independent set (red vertices)
    Fig 2: Graph without independent set
    Fig 3: Graph without independent set is 2-colorable
    Fig 4: Fully colored graph with 3 colors


    Interesting points about this approach
    • Not all independent sets work, since G - IS may not be 2-colorable.
    • Enumerating all independent sets is equivalent to enumerating the power set of the vertices and determining if they form an independent set. This has 2^v iterations, each taking about v^2 time. This is asymptotically better than 3^v iterations required for a brute force 3-coloring.
    • If a graph is 3-colorable, then we can always find one color in a valid 3-coloring that is applied to at most floor(v/3) vertices. This implies that when enumerating all independent sets, we need to only find independent sets of size floor(v/3) or less. This is still O(2^n), by the way.
    Optimization for this approach

    • We start with brute force enumeration, as below, where 1 implies inclusion of the vertex and 0 its exclusion from the independent set.
      • {1, 0, 0, ..., 0} - start by choosing v0
      • {1, 1, 0, ...., 0} - after we choose v0 and v1 in the independent set.
      • {1, 0, 1, ...., 0} - if we decide that v0 and v1 cannot be in an independent set (because they share an edge) and move on to v0 and v2.
    • In general, if current selection is an independent set, keep the latest added vertex and include the next available vertex.
    • In general, if current selection is not an independent set (because the latest vertex has one or more edges with previous vertices in the independent set), drop the latest added vertex and choose the next available one.

    3-coloring with brute force
    • This is exhaustive, brute force approach. We try each combination, till we find a valid coloring.
    • Let the colors be 0, 1 and 2. Each color combination is an n-tuple. E.g. {0, 2, 1, 0, 0, 1, 1, 2, ...}. This colors vertex v0 with color 0, v1 with color 2, v2 with color 1 and so on.
    • This is O(3^n), but seems to run faster in practice. Perhaps it has a tighter bound than O(3^n).
    Optimization for this approach
    • We order vertices arbitrarily as v0, v1, v2, ..., vn.
    • We enumerate current color selection lexicographically. E.g.
      1. 0
      2. 0, 0, if the previous selection is a valid coloring
      3. 0, 0, 0 if the previous selection is a valid coloring
      4. 0, 0, 1 if the previous selection is not a valid coloring
      5. 0, 0, 2 if the previous selection is not a valid coloring
      6. 0, 1 if the previous selection is not a valid coloring
      7. 0, 1, 0 if the previous selection is a valid coloring
      8. and so on.

    3-coloring greedily
    • This is a simple approach, very fast in practice, but cannot color all colorable graphs.
    • Order vertices by descending order of degree.
    • Greedily choose the lowest color that is valid.
    • Repeat above step till all vertices are colored.
    • Runs in O(v^2) time.