About Me

My photo
Ravi is an armchair futurist and an aspiring mad scientist. His mission is to create simplicity out of complexity and order out of chaos.

Sunday, November 18, 2012

Redei's theorem

Redei's theorem states that every tournament has a directed Hamilton path. We prove this theorem in this blog post.

Background


Tournament

A tournament is a complete graph with oriented edges.
It can be viewed as the result of a round-robin tournament, where every team plays every other team and there is always a winner and a loser in every match (no ties). The direction of each edge is from the winner to the loser of that match. In the above graph, team 1 beat team 2, hence the edge $1\rightarrow 2$ and so on.

Hamilton path

A Hamilton path is a path connecting all vertices of a graph (once and only once).
The directed path shown above in red is a Hamilton path, since it connects all vertices of the graph.

Now we are ready for the theorem and its proof.

Redei's theorem

Every tournament has a directed Hamilton path. This was first proven by Laszlo Redei in 1934.

Proof by induction

For the base case, consider a directed graph on 2 vertices, say $v_1\rightarrow v_2$. This is also a Hamilton path, since it covers both vertices. So the statement holds true for our base case.

For the inductive step, we assume that each tournament on $(n-1)$ vertices has a Hamilton path. Assume that this path is {$v_1,\cdots,v_{n-1}$} as shown in the graphs below. We consider 3 different scenarios for the new vertex $v$ added to this graph.

  1. In the first scenario, we have an edge $v\rightarrow v_1$ as shown by the red edge in the graph below. The new path $\{v,v_1,\cdots,v_{n-1}\}$ is a Hamilton path. So for this scenario, a tournament on $n$ vertices does have a Hamilton path.
  1. In the second scenario, we have an edge $v_{n-1}\rightarrow v$ as shown by the red edge in the graph below. The new path $\{v_1,\cdots,v_{n-1},v\}$ is a Hamilton path. So for this scenario too, a tournament on $n$ vertices does have a Hamilton path.
  1. In the final scenario different from the previous two, we have both $v_1\rightarrow v$ and $v\rightarrow v_{n-1}$ as shown in the graph below. In this case, the first vertex $v_i$ such that there is an edge $v\rightarrow v_i$ (shown as a dotted edge) completes a Hamilton cycle $\{v_1,\cdots,v_{i-1},v,v_{i+1},\cdots,v_{n-1}\}$. (Note that $i$ could be $n-1$ (the last vertex) if all edges preceding it go into $v$.) So for this scenario too, a tournament on $n$ vertices has a Hamilton path.

The above cover all the scenarios for the inductive step, completing an inductive proof of Redei's theorem that every tournament has a directed Hamilton path.

Conclusion

Using the analogy of matches in a round-robin tournament between $n$ teams, Redei's theorem says that it is always possible to find $n$ matches, such that team A beat team B, which beat team C and so on, which beat team N. Now that was not obvious before! (Note: team A doesn't mean team 1. $\{A, B, \cdots, N\}$ is some permutation of $\{1, 2, \cdots, n\}$.)

References

  1. Bondy, J.A., Murty, U.S.R., Graph Theory, 2008.

Friday, September 14, 2012

Gale-Shapley algorithm as a picture

Here is a pictorial depiction of the Gale-Shapley algorithm:

Gale-Shapley algorithm


  • Each man $M_i$ is drawn as a square and each woman $W_j$ as a circle.
  • Each person's choices are arranged clockwise, in ascending order of desirability - i.e. the bottom choice first, followed by better and better choices and eventually the top choice in a clockwise manner.
  • According to the algorithm
    • Man proposes to his highest choice woman that he has not proposed to. So, in the diagram above, he goes through his choices anti-clockwise.
    • Woman accepts a proposal if the man is better than her current fiance.  So, in the diagram above, she goes through her choices clockwise.
The following is apparent from the picture:
  1. As the algorithm progresses, every woman can only improve her situation. In other words, the worst she can do is her current fiance - it only gets better for her!
  2. As the algorithm progresses, every man's position can only get worse. In other words, "it is all downhill" for him.
  3. The reason for rejecting a man's proposal is always that the woman's current choice is better than him (for her) and it can only get better for her, implying that he cannot introduce instability for those women who have rejected him.
  4. Since a woman can be proposed to only once by each man, she can get no more than $|M|$ proposals. Since there are $|W|$ women and some woman is proposed to at each step, the algorithm has to terminate in $O(|M|\times|W|)$ time.
  5. When we have an equal number of men and women, each woman will be proposed to (eventually). This means that each woman will have a fiance (she must accept a proposal if she has no fiance). Since $|M|=|W|$ in this case, this algorithm runs in $O(n^2)$ time.
References

Friday, August 17, 2012

Schwartz-Zippel Lemma

The intuition behind this lemma seems rather straight-forward. An algebraic multi-variate equation of total degree $d$ can have no more than $d$ roots. In other words, a maximum of $d$ values (or elements from a field, if you will) can satisfy that algebraic equation (unless, of course, the equation is identical to zero, in which case any element satisfies the equation).

Examples
  • $x^2-3x+5$ has total degree 2. So it can have no more than 2 roots.
  • $x^3+x^2y^2+y^3$ has total degree 4. So it can have no more than 4 roots.

With the above intuition, it seems "obvious" that for any given set of elements (no matter how large or small), at most $d$ elements from that set can satisfy the given equation. So, the probability that a random element from this set will satisfy the equation is no more than $d/|S|$. And that is exactly the statement of this lemma.

Statement
Let $\mathbb{Q}(x_1,\cdots,x_n)\in\mathbb{F}[x_1,\cdots\,x_n]$ be a multi-variate polynomial of total degree $d$. Fix any finite set $S\subseteq\mathbb{F}$, and let $r_1,\cdots,r_n$ be chosen independently and uniformly at random from $S$. Then \[\mathbf{Pr}[\mathbb{Q}(r_1,\cdots,r_n)=0\hspace{1mm}|\hspace{1mm}\mathbb{Q}(x_1,\cdots,x_n)\not\equiv 0]\leq\frac{d}{|S|}\]
References

Sunday, July 1, 2012

Verifying equality of large data

The general pattern of verifying equality of two pieces of data, say $a$ and $b$, is comparing each bit in sequence, until we either:
  • find a mismatch (in which case, we declare that $a\neq{b}$) or
  • we reach the end (in which case, we declare that $a=b$).
With this approach, our answer is correct with 100% confidence. However, for this absolute confidence, we need $O(n)$ comparisons, which may be inconvenient as $a$ and $b$ can be arbitrarily large, e.g. 1 PB of data. Additionally, $a$ and $b$ may not necessarily be available at the same time (e.g. $a$ and $b$ are two databases that are geographically separated), so comparing bits has network overhead.

The question we are trying to address here is if there is a way to verify equality with high confidence (say 99% confidence) without paying the penalty of comparing each bit or the associated network overhead. We describe a randomized algorithm that does this, with significantly less than $O(n)$ comparisons. Conceptually, the algorithm hashes the data and compares the hashed values instead of the original data (hardly a novel idea). The interesting part of the algorithm is how it comes up with a provably good hash function.

Algorithm

  1. Treat bits of data as bits of a large number. So 1 peta byte (PB) of data becomes a single number with $2^{53}$ bits, meaning its value can be of the order of $2^{2^{53}}$.
  2. Since we want to compare two pieces of data, $a$ and $b$, conceptually, we are looking at the number $c=|a-b|$. In particular, we are interested in finding out if $c=0$, because if it is, then the input data is equal, not otherwise.
  3. A number $c$ cannot have more than $\lfloor{\log_2{c}}\rfloor$ unique prime divisors. So a number representing 1 PB of data has no more than $2^{53}\approx 8\times{10^{15}}$ unique prime divisors.
  4. Choose $\tau=2(\log_2{c})^2\log_2({\log_2{c}})$. For 1 PB, $\tau\approx{7\times 10^{32}}$. By the prime number theorem, this number has about $10^{31}$ prime numbers less than it. These are enough prime numbers such that if we were to draw one at random from it, the probability that it is one of the prime divisors of $c$ is less than $1/\log_2{c}$. In other words, the number of primes less than $\tau$ is about the square of the number of prime divisors of $c$. For our example, this probability is less than $1/2^{53}\approx1.25 \times 10^{-16}$, if I got the calculations correct.
  5. Choose a prime number less than $\tau$ at random. Note that this involves choosing $O(\log\tau)$ random bits. In our example, this is about 110 bits.
  6. Construct a hash function $F_p(x) = x \mod p$. 
  7. Determine $F_p(a)$ and $F_p(b)$. Since these hash functions are mod functions, the output is no more than $log(\tau)$ bits, in our case 110 bits.
  8. If $F_p(a)=F_p(b)$, then we say that $a=b$. Otherwise, we say that $a\neq b$. So we compared $O(\log{\tau})$ number of bits, instead of $O(n)$.
  9. The probability that this algorithm errs is less than $1/\log_2{c}$. So in our case, it is $1.25 \times 10^{-16}$, which is practically zero.

Conclusion and notes

  • To determine equality of 1PB of data, instead of comparing $2^{53}$ bits of data, we compared 110 bits of data. This introduced a minuscule probability of error.
  • However, computing prime numbers is a hard problem (and large ones, especially so!). But this problem is CPU intensive. So we replaced a network intensive operation with a CPU intensive one, with a small probability of error.
  • Computing hash function $F_p(x)$ needs access to the entire content of $x$, which is potentially a disk intensive operation.

References

  • Motwani, R. and Raghavan, P., Randomized Algorithms, 2007.

Friday, June 15, 2012

On Weak and Strong Laws of Large Numbers

Introduction

In statistics, we use the mean calculated from a sample as an estimate for the mean of the population. E.g. if the average height of a random sample of a thousand people from a region is 6 feet, then we estimate that the average height of all people in that region is 6 feet. Why does this work? The weak and strong laws of large numbers provide a theoretical basis for why this works. Given below are the laws themselves and their difference, which serves as a justification for their names.


Notation

  1. Let $X_1$, $X_2$, $\cdots$, $X_n$ be independent and identically distributed random variables. (In layman terms, $X_1$ is the first observation, $X_2$ is the second and so on.)
  2. Let $M_n$ be a random variable denoting the mean of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, \[M_n=\frac{1}{n}\sum_{1}^{n}X_i\]So this is the mean of the sample.
  3. Let $\mu$ be the mean of each of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, $\mathbf{E}(X_i)=\mu$ for each $i$. So $\mu$ is the mean of the population (usually unknown, which is why we want to estimate it!).

Weak Law of Large Numbers

This law states that for any $\epsilon>0$,
\[\lim_{n\to\infty}\mathbf{P}(|M_n-\mu|>\epsilon)=0\]Interpretation
  • For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean $\mu$ by more than any given number $\epsilon$ is 0.
  • Alternatively, all probability is concentrated in an $\epsilon$-interval around $\mu$.
  • Alternatively, almost surely, for large samples, the sample mean is within an $\epsilon$ neighborhood of the population mean.

Strong Law of Large Numbers

This law states that
\[\mathbf{P}(\lim_{n\to\infty}M_n=\mu)=1\]Interpretation

  • For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean at all is 0.
  • Alternatively, all probability is concentrated at $\mu$.
  • Alternatively, almost surely, for large samples, the sample mean is exactly the population mean.

Difference between the two laws

  • Strong law is stronger than the weak law because the strong law allows for $\epsilon=0$, while the weak law has to have  $\epsilon>0$.
  • Per the strong law, all probability is concentrated at $\mu$, while per the weak law, it is concentrated in the interval $(\mu-\epsilon,\mu+\epsilon)$, which is infinitely larger because $\epsilon>0$.
  • Because the probability of the sample mean, $M_n$ differing from population mean $\mu$ is 0, the strong law allows for only a finite number of values of $M_n$ to differ from $\mu$. In other words, there are only a finite number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. Now that is a very strong statement!
  • Because the probability of the sample mean $M_n$ differing from population mean $\mu$ is positive (although small), the weak law allows for an infite number of values of $M_n$ to differ from $\mu$. In other words, there are an infinte number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. This is clearly weaker than the previous statement.

References

Friday, May 18, 2012

Freivalds' Algorithm

Introduction

Freivalds' algorithm is used to verify correctness of matrix multiplication. It is a randomized Monte Carlo algorithm and has a small probability of error. By repeatedly running this algorithm, one can reduce the probability of error below any threshold.


Motivation

One naive way to verify correctness of matrix multiplication is to actually redo the multiplication and compare the output with the given input. However, current matrix multiplication algorithms are quite slow - the fastest one being $O(n^{2.376})$. Freivalds' algorithm takes $\Theta(n^2)$ time to verify this multiplication and errs with probability $\leq 1/2$, only when the multiplication is incorrect. It therefore runs faster than matrix multiplication itself and is usually correct.


Freivalds' algorithm

  1. Input: $n\times{n}$ matrices $A$, $B$ and $C$. We would like to verify if $A\times{B}=C$.
  2. Choose a $n\times{1}$ column vector $\overrightarrow{r}\in\{0,1\}^n$, uniformly and at random. In other words, each component of the vector is either 0 or 1 and chosen randomly and uniformly.
  3. Compute $A\cdot(B\cdot\overrightarrow{r})$ and $C\cdot\overrightarrow{r}$. This takes $\Theta(n^2)$ time.
  4. Now comes the output
    1. If $A \cdot (B\cdot\overrightarrow{r}) \neq {C\cdot\overrightarrow{r}}$, output FALSE. In other words, we say that the given matrix multiplication is incorrect. This has a probability of correctness = 1.
    2. If $A \cdot (B\cdot\overrightarrow{r}) = {C\cdot\overrightarrow{r}}$, output TRUE. In other words, we say that the given matrix multiplication is correct. This has a probability of correctness $\geq{1/2}$. (See references for details.)

Discussion

  • If Freivalds' algorithm says that:
    • $A\times{B}\neq{C}$, then this statement is always correct.
    • $A\times{B}={C}$, because we are only doing a small number of checks ($n$ - one for each row of the result), it is possible that $A\times{B}\neq{C}$, but the algorithm cannot determine it in the amount of time it has. (See references for details.)
  • Stated another way:
    • when $A\times{B}={C}$, Freivalds' algorithm is always correct.
    • When $A\times{B}\neq{C}$, the algorithm errs with probability $\leq{1/2}$.
  • By re-running this algorithm $k$ times and choosing a random $\overrightarrow{r}$ each time, we can reduce the probability of error to $\leq{1/2^{k}}$.
  • In other words, in $\Theta({kn^2})$ time, Freivalds' algorithm guarantees that the error in verifying multiplication of two $n\times{n}$ matrix is $\leq{1/2^{k}}$.

References

  1. Wikipedia - Freivalds' algorithm
  2. Motwani, R. and Raghavan, P., Randomized Algorithms, 2007.

Monday, March 12, 2012

Friday, February 10, 2012

Randomized Rounding for Integer Programming

Motivation
  1. Solving equations that allow just integer solutions is usually harder than solving for fractions. An example is the knapsack problem, whose fractional form can be solved in linear time, but its integer form is NP-complete.
  2. An intuitive way to reduce the computational complexity of integer programming is to approximate the integer solution by the following process:
    1. Solve the fractional problem instead.
    2. Round off fractions to integers "intelligently" and
    3. Know the bounds on error or non-optimality introduced by this approximation.
  3. In this article, we round off fractions randomly (based on some probability), hence the name.
Introduction
This article is motivated by the "Wiring Problem" as discussed in the  book Randomized Algorithms by Motwani and Raghavan. This problem can be stated as follows:
Given a grid of squares, lay out wires between the given squares. Number of wires crossing any boundary of any square is to be constrained to no more than a given number $\omega_0$. Furthermore, minimize the maximum number of wires that cross any boundary.
Refer to the diagram below that shows a 3x3 grid, with 3 wires (labeled in red). For clarity, boundaries are labeled in blue separately. Each row has 2 vertical and 3 horizontal boundaries, for a total of 12 for the grid.


We limit ourselves to solutions containing a maximum of one "turn" in the wire. In the above diagram, wires 1 and 2 have one turn in them, while wire 3 has no turns ("it is straight").

Solving this problem involves solving the integer programming problem, which is NP-complete. Instead, we will solve it using randomization, which reduces the complexity significantly (down to the complexity of solving the fractional linear programming problem). Given below is a mathematical analysis of this solution.

Notation
  1. We number wires from $1$ to $a$. So, in the above diagram, we have $a=3$. We use the subscript $i$ for wires.
  2. For each wire $i$, we have 2 variables - $h_i$ and $v_i$ denoting whether the first part of the wire is horizontal or vertical.
    1. If the first part is horizontal, then $h_i=1$ and $v_i=0$.
    2. If the first part is vertical, then $h_i=0$ and $v_i=1$.
    3. So, for all wires $h_i,v_i\in\{0,1\}$
    4. In the above wiring, {$h_1=0$, $v_1=1$}, {$h_2=1$, $v_2=0$} and {$h_3=0$, $v_3=1$}.
  3. Additionally, this means that only one of $h_i$ and $v_i$ is 1 and the other is 0, since the first part of any wire can either be horizontal or vertical (at least one) and not both. So, for all wires: \[h_i+v_i=1\]
  4. We number boundaries from $1$ to $b$. So, in the above diagram, we have $b=12$. We use the subscript $j$ for boundaries.
  5. For each boundary $j$, we have 2 variables - $H_j$ and $V_j$ - denoting the wires that pass through that boundary.
    1. $H_j$ is the set of all wires that start out horizontally and pass through boundary $j$.
    2. Analogously, $V_j$ is the set of all wires that start out vertically and pass through boundary $j$.
    3. As examples, {$H_3=1$, $V_3=0$}, {$H_7=0$, $V_7=1$}. It is important to note that {$H_9=1$, $V_9=1$}, since the number of wires that start out horizontally and pass through boundary 9 is 1 (wire 2)
  6. Let $F_j$ denote the number of wires passing through boundary $j$. Since that number is no more than $\omega_0$: \[F_j=\sum_{i\in{H_j}}{h_i}+\sum_{i\in{V_j}}{v_i}\leq\omega_0\]
    1. The first summand corresponds to the number of wires that start out horizontally and pass through boundary $j$.
    2. The second summand corresponds to those that start out vertically and pass through boundary $j$.
    3. Their sum, which denotes the total number of wires passing through boundary $j$, obviously has to be no more than the specified constraint, $\omega_0$.
    4. Note that $F_j$ is an objective function in the linear programming terminology.
  7. Let $F=\max{F_j}$, i.e. the maximum number of wires passing through any boundary with a given solution. The solution with the least $F$ value is the optimal solution. In other words, the solution that requires the lowest maximum of wires passing through any boundary is defined as the optimal solution.
Randomized Rounding
  1. We first solve the linear programming problem using fractional values. There are well-known algorithms that solve this problem in polynomial time.
  2. Let $\widehat{h_i}$ and $\widehat{v_i}$ denote the fractional value solutions (e.g. 0.65 and 0.35, they are constrained to add up to 1). Note that they are the fractional counterparts to integer solutions, $h_i$ and $v_i$ (e.g. 1 and 0). So we have \[\widehat{F_j}=\sum_{i\in{H_j}}\widehat{h_i}+\sum_{i\in{V_j}}\widehat{v_i}\leq\widehat{\omega}\leq\omega_0\] where $\widehat{F_j}$ is the number of wires crossing boundary $j$ in the fractional case (so this will be a fraction, generally, e.g. $2.8$).
  3. The reason $\widehat{\omega}\leq\omega_0$ is that the fractional solution $\widehat{\omega}$ could potentially be "more optimal" (because it is not constrained by integer values) and hence lower than $\omega_0$.
  4. Since $h_i$ and $v_i$ denote how wire $i$ is to be oriented, $\widehat{h_i}$ and $\widehat{v_i}$ can be considered as approximations (in fact, probabilities) to this orientation.
  5. Here comes the crux of the randomized rounding approach. We round off $\widehat{h_i}$ to 1 with probability $\widehat{h_i}$ and to 0 with the remaining probability. E.g. if the fractional linear programming solution has $\widehat{h_i}=0.65$ and consequently, $\widehat{v_i}=0.35$, then we round off $\widehat{h_i}$ to 1 with probability 0.65 and to 0 with probability 0.35.
  6. Lets call the rounded off variable as $\overline{h_i}$ and $\overline{v_i}$ respectively. Note that these are random variables and not values. That means, probability $\mathbf{P}(\overline{h_i}=1)=\widehat{h_i}$ and $\mathbf{P}(\overline{h_i}=0)=1-\widehat{h_i}$. Note that mean of this variable, $\mathbf{E}(\overline{h_i})=\widehat{h_i}$.
  7. What can we say about the value of the objective function for these random variables (which denotes the number of wires crossing boundary $j$ in the randomized scenario)? In other words, what can we say about the random variables \[\overline{F_j}=\sum_{i\in{H_j}}\overline{h_i}+\sum_{i\in{V_j}}\overline{v_i}\]
  8. We derive the expected value (mean) of these random variables. \[\mathbf{E}(\overline{F_j})=\mathbf{E}(\sum_{i\in{H_j}}\overline{h_i}+\sum_{i\in{V_j}}\overline{v_i})\] By linearity of expecations, we have \[=\mathbf{E}(\sum_{i\in{H_j}}\overline{h_i})+\mathbf{E}(\sum_{i\in{V_j}}\overline{v_i})\] \[=\sum_{i\in{H_j}}\mathbf{E}(\overline{h_i})+\sum_{i\in{V_j}}\mathbf{E}(\overline{v_i})\] \[=\sum_{i\in{H_j}}\widehat{h_i}+\sum_{i\in{V_j}}\widehat{v_i}\[\leq\widehat\omega\hspace{4mm}\text{(from (1))}\]
  9. In other words, $\mathbf{E}(\widehat{F_j})\leq\widehat{\omega}\hspace{2mm}\forall{j}$. This means that the mean value of each of the objective functions $\widehat{F_j}$ is bounded from above by $\widehat\omega$ (which, by the way, happens to be better than our original bound of $\omega_0$ (from (1)).
  10. This means that, on an average, the number of wires crossing a boundary in the randomized scenario is equal to the value in the fractional scenario, for each boundary. This is huge, since that value is better than the value in the integer scenario!
We now analyze the sub-optimality introduced in the objective functions because of using the randomly rounded variables. More concretely, this means that we analyze the bounds on the increase in the number of wires passing through each boundary due to randomly choosing the direction of each wire.

Probabilistic analysis
  1. Let $\omega_S$ be the maximum number of wires passing through any boundary in our solution $S$ obtained from the above randomized rounding algorithm.
  2. We would like to determine how far this $\omega_S$ deviates from either our optimal solution $\omega_0$ for integer values or $\widehat{\omega}$ for fractional values.
  3. Graphically, in the diagram below, we are trying to find out how far $\omega_S$ is from $\widehat{\omega}$ or $\omega_0$, if we are willing to tolerate a maximum error of $\epsilon$.


  1. If the grid has $n$ squares, we have no more than $2n$ boundaries. This implies that for each boundary, we can tolerate an error of no more than $\epsilon/2n$.
  2. Chernoff bound gives us the following: \[\mathbf{P}[X\geq(1+\delta)\mu]<\left\{\frac{e^\delta}{(1+\delta)^{(1+\delta)}}\right\}^\mu=\epsilon/2n\] where $\mu$ is the mean and $\delta$ is the relative distance away from the mean.
  3. In our case, $\mu=\widehat{\omega}$ and we would like to find the value of $\delta$ such that the error is $\epsilon$. The exact value of $\delta$ that gives a specified error of $\epsilon/2n$ is quite complicated to determine and we will denote it as $\Delta^{+}(\mu,\epsilon/2n)$.
  4. Substituting this value in the Chernoff bound equation, we get \[\omega_S<\widehat{\omega}(1+\Delta^{+}(\mu,\epsilon/2n))\]
  5. While $\Delta^{+}(\mu,\epsilon/2n)$ is hard to determine, it is usually quite small, $O(1/n)$.
  6. This means that as the number of squares in the grid increases, the solution given by the randomized case increasingly approximates the solution given by the fractional case! In other words, the orientation of each wire determined by the randomized case is increasingly the optimal orientation, as the size of the grid increases.

Summary
We took an integer programming problem (which is NP-complete), solved it using fractional linear programming (which is polynomial time), used randomization to approximate each fraction to its integer
equivalent and found that for large grids, the solution that this approximation yields is fairly close to the optimal solution for the original, integer programming problem.

References
  1. Motwani, R. and Raghavan, P., Randomized Algorithms, 2007.