Numerical computations in C++ : Converting from Decimal to Binary

Converting from decimal to binary notation is fairly standard and straightforward.

Let’s say we need to find the binary representation of 100_{10}.

The method that most of us would have been taught in school is the following:

- Divide 100 by 2. You get 50, with remainder = 0. Write 0.

- Divide 50 by 2, to get 25 and remainder = 0. Write 0.

- Divide 25 by 2 to get 12 and remainder = 1. Write 1.

- Continue doing so until the number (which is 25 at present) becomes 0.

- The binary representation of 100 is the reverse of what you have written.

——

We can write a recursive function to compute the binary representation of a decimal number. Using the power of recursion, we can avoid having to reverse the output as is necessitated by the above method.

——-

// Function dec_to_bin outputs the binary equivalent of the decimal number n.

void dec_to_bin ( int n)

{

if ( n < 0)

{

cout << “-”;

n = -n;

}

if (n == 0)

return;

dec_to_binary (n/2);

cout << n%2;

}

——–

dec_to_bin will take time proportional to the number of bits in the binary equivalent of n.

——–

Numerical computations in C++ : Converting from Decimal to Binary – 2

Earlier we looked at a recursive method for conversion from decimal to binary. We can also achieve the conversion in an iterative manner, however adding to the task the additional overhead of reversing the answer that we obtain.

The underlying method is the one discussed at the beginning of the previous post. At each iteration, we divide a number by 2 and store the corresponding remainder in an array. At the end of the process, we reverse the contents of the array.

Now, we are faced with a small problem. How do we know what the size of our array should be? Fortunately, we can proceed without knowing that, using the resizeable vector container class. (A vector is like a dynamic array. We need not specify its size at the time of definition.)

In order to reverse the vector, we can simply use the reverse algorithm defined in the algorithm header. Or if we simply want to output the binary representation (without needing to store it anywhere), we may simply use a reverse_iterator to iterate through our vector.

———–

// function dec_to_bin outputs the binary equivalent of the decimal input n.

#include <iostream>

#include <vector>

using namespace std;

void dec_to_bin ( int n )

{

vector <int> v; // vector of ints to store the binary representation.

int temp = n;

if ( n < 0)

temp = -n;

while ( temp != 0)

{

v.push_back (temp%2);

temp / = 2;

}

vector<int> reverse_iterator i;

if ( n < 0)

cout << ” – ” ;

for ( i = v.rend ( ); i != v.rbegin ( ); ++i)

cout << *i;

}

——-

If we used a list or a deque instead of a vector, we wouldn’t be faced with the task of having to reverse the values computed by us. We can do this by storing successive numbers that we generate at the front of the data structure instead of at the end.

We may do that using the push_front member function for lists and deques. (Note that we cannot use push_front for a vector. We can only insert and delete elements (efficiently i.e. in constant time) at the back of a vector.)

——

The above program using a deque along with push_front would look as follows:

——

#include <iostream>

#include <deque>

#include <algorithm>

using namespace std;


void dec_to_bin ( int n )

{

deque <int> v;

int temp = n;

if ( n < 0)

temp = -n;

while (temp != 0)

{

v.push_front ( temp%2);

temp /=2;

}

if ( n < 0)

cout << “-”;

copy (v.begin( ), v.end ( ), ostream_iterator<int> (cout,”");

}

——-

Instead of using an iterator to iterate through the deque, we can use the copy algorithm defined in the header algorithm to directly output the contents of the deque. (Note that the statement #include <algorithm> though written as part of the above program is not necessary, since algorithm.h automatically gets included when we write #include <deque>. )

——-

Numerical computations in C++ – Exponentiation by repeated squaring

Let’s say that we have to compute the power of a number without using the inbuilt pow function. We can compute the answer using the method of repeated squaring.

Firstly, we assume that the exponent is an integer.

For example. let us compute (13.5)^5.

Now, we know that 5 is 101 in binary notation.

Hence, what we need to compute is 13.5^4 \times 13.5^0 \times 13.5^1

We also observe that 13.5^4 = (13.5^2)^2 and that 13.5^2 = (13.5^1)^2.

From these observations, we can deduce that repeated squaring of 13.5 and multiplication of the appropriate intermediate results would lead us to the desired answer. (i.e. Compute 13.5^1, 13.5^2, 13.5^4 and take the product of 13.5^1 and 13.5^4.)

Here, prior to the repeated squaring, we calculated the binary representation of the given exponent. We need not necessarily do that. We can do the squaring and the computation of the binary representation simultaneously.

———

Here is a C++ code fragment to illustrate that:-

//function f takes a computes x^e (where e >= 0)  and returns a long value.

long f (int x, int e)

{

int temp = e, rem; //rem stands for remainder.

long answer = 1, term = 1;


while ( temp != 0)

{

term = term * x;

rem = temp%2;

if (rem == 1)

{

answer = answer * term;

}

temp / = 2;

}

return answer;

}

——–

If we assume that all arithmetic operations take O(1) time, then the above function takes O(n) time where n represents the number of bits in the binary representation of the exponent, e.

———

Dynamic programming – Probability, gambling and a die

Let’s say we have a fair 6-sided die. Player A can earn money based on what shows up when the die is rolled. The rule of the game is as follows: Player A will roll a die upto 3 times. After each roll, Player A has two choices: (1) He can take money in dollars equal to the number showing on the upturned face of the die. (i.e. if 5 shows on the die, he can take $5.)  In this case, the die is not rolled any more, and the game ends. (2) He can decide to discard this roll of the die, and decide to roll the die again. Note that the die can be rolled at most 3 times.

Question — What is the expected payoff for Player A?

——–

Let’s say that instead of 3 rolls of the die, the game allowed only for 1 roll. What is the expected payoff for Player A in this case? It is clearly equal to the expected value of the number that shows up when the die is rolled. And since the die is fair, this expectation is equal to (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. And Player A has no choice but to accept whatever shows up on the die because there is only one roll possible. Hence, in this game, the expected payoff for Player A is $3.5.

Now, consider what happens if instead of 1 roll, we allow for upto 2 rolls.

Now, if the 1st roll is a 6, obviously Player A would pocket $6 and decide to end the game there itself. Also, if the 1st roll is a 1, Player A can always decide to roll again [because irrespective of what shows up on the 2nd roll, it cannot be less than $1, and hence Player A has nothing to lose by deciding to discard the 1st roll and opting to roll again.]

Now consider what happens if the 1st roll is a 3. Now, Player A is faced with a predicament. What should he do? Should he decide to stop there and go home with $3, or should he discard this 3 in the hope of higher returns in the 2nd and final roll?

This is where having learnt probability well in school helps.

We know that the expected value of the number that shows up when the die is rolled once is 3.5.

Hence, if in the 1st roll, Player A gets a number greater than 3.5 i.e. either 4, 5 or 6, he can decide to stop there and take the corresponding amount of money.

However, if he gets either 1, 2 or 3 in his 1st roll, he should discard that, and decide to roll again, because the expected value of the 2nd roll is 3.5.

Hence, the expected payoff for Player A from a game which allows for at most 2 rolls is: \frac{3}{6} 3.5 + \frac {1}{6} 4 + \frac{1}{6} 5 + \frac{1}{6} 6 = 4.25.

(Here, 3/6 is the probability of getting a 1, 2 or 3 on the 1st roll, in which case the expected payoff becomes 3.5. )

——-

Now, coming back to our original game, we can decide as to how we should proceed after the die is rolled once.

We know that the expected payoff from 2 rolls of the die is $4.25.

Hence the strategy of Player A should be the following:

- If the 1st roll of the die yields either a 5 or a 6, take the money, and end the game.

- Else, discard the 1st roll, and continue the game.

The expected payoff can be computed easily based on the preceding observations:

Expected payoff for Player A = \frac{4}{6} 4.25 + \frac{1}{6} 5 + \frac{1}{6} 6 = \frac {14}{3}.

———-

This is a classic question using the principle of dynamic programming. Let’s say the the game allows for the die to be tossed upto n times.

Let E[k] denote the expected payoff for Player A in a game that allows upto n-k rolls of the dice.

The strategy for Player A after roll k should be the following:

– If the number showing on the dice is greater than E[k], take the money and stop the game.

– Else, discard that roll, and decide to continue on to the next roll.

——-

The recurrence relation for computing the E[k] values can be derived in terms of E[k+1] as was done in the case of the 3-roll example that we saw earlier. Once we have done that, we can calculate the values in the order, E[n-1], E[n-2] and so on uptil E[0].

——-

Probability and Stochastic processes – Gambler’s ruin problem

Let’s say there are 2 tennis players, A and B. Now, A is a better player than B and hence wins 2/3 of the matches that they play. However, they don’t just play tennis – they also gamble on the results. Initially, let’s say that A has $1 and B has $2. The rule is that whoever loses a tennis match must give $1 to the other. They keep on playing tennis matches until one of them becomes bankrupt, at which point they stop. What is the probability that B will become bankrupt?

——

We note that whatever handing over of the money takes place between A and B, the total amount of money remains constant at $3.

There are 4 possible states —

State 0 — A has $0

State 1 — A has $1

State 2 — A has $2

State 3 — A has $3.

(Note that the amount of money that B has, in each of the above states, is completely determined by simply stating the amount of money with A since the total amount of money remains constant.)

We get the following transition diagram with probabilities depicted besides the arrows.

The probability that we wish to compute is the probability of reaching State 3 starting initially at State 1.

We can compute that quite easily. Let p_{i} denote the probability of reaching State 3 from State i.

We directly get the following from the above transition probabilities –

p_2 = \frac{2}{3} + \frac{1}{3} p_1

p_1 = \frac{1}{3} p_0 + \frac{2}{3} p_2.

p_0 = 0.

——-

Therefore, our desired probability, p_1 = 4/7, which means that there is 4/7 probability that the series of tennis matches would end with B becoming bankrupt.

——–

Probability – Monty Hall problem

Our friend Monty Hall is a game show host. Every week he asks one person from his audience to come onto the stage. On the stage are 3 closed doors. Behind one is a brand new car, and behind the other 2 are goats. The car is equally likely to be behind any of the 3 doors. The contestant selected from the audience obviously wants to win the car.

Monty Hall explains the rules of the game to the contestant — In front of you are 3 closed doors. Behind one of them is a car, and there are goats behind the other 2. I know what is behind each door. Your first task is to select a door.

Our contestant selects door 1.

—-

Monty now again starts expounding on probability — You have selected door 1. Now, I will select either door 2 or 3 to open. Note that I know what is behind each of the doors. I will decide which door to open based on the following rules:

- If door 2 has a car behind it, I will open door 3.

- If door 3 has a car behind it, I will open door 2.

- If neither of the above conditions hold true, i.e. both doors 2 and 3 have goats behind them, I will randomly select one among these 2 doors with probability 0.5 and open that door.

—-

Now, having explained thus to our contestant, the omniscient Monty goes and opens door 3, and reveals to the world (including our contestant) the goat standing behind it.

—-

Now, Monty turns to the contestant for the endgame and speaks thus — As you can can see, door 3 has a goat behind it. Doors 1 and 2 are still closed. Whichever door you select, you will get whatever is behind that door. You can either decide to stay with your original choice i.e. door 1, or you could decide to switch i.e. select door 2. What do you want to do — stay or switch?

As the music in the recording studio rises to a crescendo, our dear contestant is faced with this question on probability to answer. What should he do?

————

At first glance, it does seem that it is immaterial as to what the contestant chooses. One could argue that there are 2 doors (doors 1 and 2) one of them having a goat behind it, and the other having a car behind it. The car is equally likely to be behind either of the 2 doors. So, the contestant could as well flip a fair coin, and choose which door he wants to go with.

—-

However there is a flaw in the above reasoning.

– The probability that there is a goat behind door 1 is 2/3, and the probability that there is a car behind door 1 is 1/3.

To see why this is so, consider the following argument — Initially there were 3 doors. Two of them had goats behind them and one of them had a car behind it. Door 1 was one among these 3 doors. Therefore, the probability that door 1 has a car behind it is 1/3 and the probability that it has a goat behind it is 2/3.

Now, you could argue against the above explanation this way : ” What you said about the probability of a car being behind door 1 being 1/3, and that probability of a goat being behind door 1 being 2/3 was true initially, and not after Monty went and opened door 3, and showed everyone the goat standing behind it. Once Monty did that, we were left with 2 doors, and 1 goat and 1 car, and hence the probability that there is a car behind door 1 is 1/2 and the probability that there is a goat behind door 1 is also 1/2.”

But again, the argument that you presented above is not wholly correct. We can see it this way — we agree that initially i.e. before Monty opened door 3 and showed a goat behind that, the probability of these being a car behind door 1 is 1/3. My claim is that this remains the same even after what Monty did.

To better understand that, consider the following situation —

Assume that there are 100,000 doors instead of just 3. Also, behind 99,999 of these doors are goats and behind a single one is a car. The car is equally likely to be behind any of the 100,000 doors. So, what we have here is a 100,000-door version of Monty’s 3-door problem.

Now, you select a door, say door 385.

Now, Monty has been asked to open 99,998 doors, i.e. he has been asked to keep 2 doors closed. It is also specified in the rules that one of these closed doors should be door 385 since that was the one you selected. Further, Monty knows what is behind each of the doors, and Monty has been specifically instructed that the 99,998 doors that he opens should all reveal goats behind them.

So Monty goes and opens 99,998 doors. He leaves door 385 and one other door, say door 6724 closed. Now, as we said earlier, behind each of the 99,998 doors that Monty opened, was a goat.

Now, what do you think is the probability that there is a goat behind your selection door 385?

It is 1/100,000 and not 1/2.

The reason as to why door 385 remained closed even when just 2 doors remained (i.e. 385 and 6724) was because Monty had been instructed not to open 385 as part of the rules.

There is 1/100,000 probability that door 385 contained a car behind it, and Monty selected door 6724 uniformly at random from the remaining 99,999 doors for keeping closed. In such a situation, staying with your original selection (i.e. door 385) would win you the car.

However, it is much much more likely (99,999/100,000) that your initial selection (385) had a goat behind it, and that Monty knowing that 6724 had a car behind it, opened the remaining 99,998 doors. In this situation, switching (i.e. selecting door 6724) would get you the car.

—-

Coming back to our 3 door version, we can argue that switching would win you the car with probability 2/3, while staying with your original selection would get you the car with a probability of 1/3.

Hence, one who has a working knowledge of probability should advise the contestant to switch.

Probability and Statistics – 1 : coin flips

Question 1. I toss a fair coin until I obtain Heads. How many times am I expected to toss the coin (this number includes the coin toss which resulted in the Head)? What is the variance of the number of coin tosses?

Solution. This is exactly the geometric distribution with p = 0.5. Hence, I am expected to toss the coin 1/p = 2 times.

We can derive it as follows:

Let X be the random variable denoting the number of coin tosses.

If we get a H in the 1st toss, we are done. —- Pr [ X = 1 ] = 1/2. ( the probability is 1/2 since it is a fair coin.)

For us to have X = 2, the 1st toss must result in a T, and the 2nd in a H. —-

Pr [X=2]=0.5^2.

Similarly, X = 3 only if we get the sequence T T H in our 1st 3 coin tosses. —- Pr [ X = 3] = 0.5^3.

Thus, we have that,

E [ X ] = 1 \times 0.5 + 2 \times 0.5^2 + 3 \times 0.5^3 \ldots

\Rightarrow 0.5 E[X] = 0.5^2 + 2 \times 0.5^3 + \ldots

Subtracting the 2nd equation from the 1st, we get:

0.5 E [X] = 0.5 + 0.5^2 + 0.5^3 + \ldots

\Rightarrow E[X] = 2.

—-

Variance of X:

Var (X) = E[X^2] - (E[X])^2

We have that, E[X^2] = 1 \times 0.5 + 4 \times 0.5^2 + 9 \times 0.5^3 + \ldots

\Rightarrow 0.5 E[X^2] = 0.5^2 + 4 \times 0.5^3 + 9 \times 0.5^4 + \ldots

Subtracting the 2nd equation from the 1st:

0.5 E[X^2] = 0.5 + 3 \times 0.5^2 + 5 \times 0.5^3 + \ldots

\Rightarrow 0.25 E [X^2] = 0.5^2 + 3 \times 0.5^3 + \ldots

\Rightarrow 0.25 E[X^2] = 0.5 + 2 (o.5^2 + 0.5^3 +\ldots)

\Rightarrow E[X^2] = 6.

Therefore, Var (X) = 6 - 4 = 2.

—–

Question 2.I toss a fair coin until I get either HH or TT. Let X be the number of coin tosses I make. Find E[X].

Solution.

The 1st coin toss can result in either a H or a T. If we are given that X = k, then the remaining k-1 coin tosses are uniquely determined given the result of the 1st toss. The fixed sequence occurs with probability 0.5^{k-1}.

Thus, we have that, E [X] = 2 \times 0.5 + 3 \times 0.5^2 + 4 \times 0.5^3 + \ldots

Solving, we get, E[X] = 3.

—-

Probabilistic Method – 6

Q.1. Let X_1, ..., X_n be non-negative real numbers. If \lim_{n \rightarrow \infty} \frac{Var[X_n]} { (E[X_n])^2} = 0, then \lim_{n \to \infty} Pr [ X_n > 0] = 1.

Solution.

Pr [X_n \leq 0] \leq Pr[ | X_n - E[X_n] | \geq E[X_n] ] \leq \frac{Var[X_n]}{ (E[X_n])^2}. (the 2nd inequality follows from Chebyshev’s formula)

Taking limits as n \rightarrow \infty on both sides, we get the desired result.

-

Q.2. If p >> 1/n, prove that, as n approaches infinity, G(n,p) contains a triangle. If p << 1/n, prove that, as n approaches infinity, G(n,p) does not contain a triangle.

Solution.

Let Y_1, \ldots, Y_{ {n \choose 3} } be indicator variables for the corresponding 3-vertex-subsets inducing a triangle in G(n,p).

Let Y = \sum Y_i.

\therefore E[ Y ] = { n \choose 3} p^3. \ldots (1)

Var[Y] = \sum_i {Var [Y_i] } + \sum_{ i \neq j} {Cov[Y_i, Y_j]}

= {n \choose 3} (p^3 - p^6) + 12 {n \choose 4} (p^5 - p^6)

\leq {n \choose 3} p^3 + 12 {n \choose 4} p^5. \ldots (2)

(Here, Cov [ Y_i, Y_j] = 0 if the corresponding triangles do not share an edge; number of ordered pairs of triangles sharing an edge = 12 { n \choose 4}.)

If p << 1/n, E[Y] = 0 as n approaches infinity, proving one part of the question.

Now, from (1) and (2), \frac{Var [Y]} {(E[Y])^2} \leq \Theta ( \frac{1} {n^3 p^3} + \frac {1} { n^2 p}).

\therefore \mbox { If } p >> 1/n \mbox{, } \lim_{n \to \infty} \frac{Var[Y]} { (E[Y])^2 } = 0, which implies that lim_{n \to \infty} Pr [ Y > 0] = 1, proving the other part of the question.

-

Probabilistic Method – 5

Q.1. Prove that {2m \choose m} \geq \frac{2^{2m}}{4 \sqrt{m} + 2}.

Solution.

Toss an unbiased coin 2m times with the coin tosses being independent. Let X denote the number of heads in the coin tosses.

\therefore E[X] = m; \mbox{ } Var [X] = m/2.

Pr[ | X - E[X] | < \sqrt{m} ] \geq 1 - 1/2 = 1/2. (using Chebyshev’s inequality) \ldots (1)

Also, Pr[ \mbox{ exactly k heads occur} ] = { 2m \choose k} 2^{-2m} \leq {2m \choose m} 2^{-2m}. (because {2m \choose m} \geq { 2m \choose k} for all possible k )

\therefore Pr[ | X - E[X] | < \sqrt{m} ] \leq (2 \sqrt{m} + 1) { 2m \choose m} 2^{-2m}. \ldots (2)

Combine (1) and (2) to get the desired result.

-

Probabilistic Method – 4

Q.1. Prove that there exists a tournament on n vertices having at least \frac{n!}{2^{n-1}} Hamiltonian paths.

Solution.

A tournament is an orientation of a complete graph. Let each of the n \choose 2 edges be oriented, independently, in either direction with probability 1/2.

Take a random permutation, \sigma of the vertex set of the graph.

Pr[ \sigma \mbox{ represents a Hamiltonian path} ] = \frac {1}{ 2^{n-1}}.

Let X be a random variable denoting the number of Hamiltonian paths.

\therefore E [X] = \frac{n!}{ 2^{n-1}}.

-

Q.2. Independence number \alpha(G) is the size of a largest independent set of G = (V,E) . Prove that for any  graph G having n vertices, m edges and average degree d = \frac{2m}{n} \leq 1, \alpha(G) \geq \frac{n} {2d}.

Solution.

Let S \subseteq V be formed by sampling each vertex of G, independently, with probability p. Let X denote the number of vertices in S, and Y, the number of edges in the subgraph G[S] induced by S.

\therefore E[X] = np \mbox{ and } E[Y] = m p^2 = \frac{1}{2} n d p^2.

\therefore E[X- Y] = np(1 - dp/2).

\therefore There exists a subset S \subseteq V such that the difference between the number of vertices in S and the number of edges in G[S] is at least np(1-dp/2).

Remove a vertex for each edge in G[S] until the resultant set is an independent set. (This procedure removes at least as many edges as vertices.)

Set p =1/d.

\therefore There exists an independent set of size at least \frac{nd}{2}.

Probabilistic Method – 3

Q.1. Given a set S and S_1, \ldots, S_m \subseteq S, such that \forall i \mbox{, } |S_i| = l. A coloring of the elements of S is a valid 2-coloring if \forall i \mbox{, } S_i is not monochromatic. Prove that if m < 2^{l-1}, there always exists a valid 2-coloring.

Solution.

Randomly, and independently, assign either Red or Blue to each element of S with probabilities p \mbox{ and } 1-p respectively.

\forall i, Pr[ S_i \mbox { is monochromatic}] = 2[ p^l + (1-p)^l ].

By union bound, Pr[ \mbox{ some } S_i \mbox { is monochromatic}] = 2m [ p^l + (1-p)^l ].

If 2m [ p^l + (1-p)^l ] < 1, there exists a valid 2-coloring of S.

Put p = 1/2 to get the desired result.

-

Q.2. Ramsey number, \mathcal{R}(k,l) = min\{ n: \mbox{ any graph on } n \mbox { vertices has a clique of size at least } k \mbox {or an independent set of size at least } l \}. Prove that \mathcal{R}(k,k) > 2^{(k-1)/2}.

Solution.

Consider random graph G(n,p) on n vertices where each of the n \choose 2 possible edges is sampled with probability p.

Take a fixed subset S of k vertices.

Pr[ \mbox{ S is an independent or a clique}] = p^{ k \choose 2 } + (1-p)^{ k \choose 2}.

\therefore Pr[ G(n,p) \mbox { has a clique or independent set of size at least } k] \leq { n \choose k} [p^{ { k \choose 2 } } + (1-p)^{ { k \choose 2}}].

We want { n \choose k}[p^{ { k \choose 2 } } + (1-p)^{ { k \choose 2}} ]< 1.

Now, since { n \choose k } \leq n^k (and setting p = 1/2), we will find n such that:

n^k \leq 2^{ {k \choose 2} -1}, which is true whenever n \leq 2^{(k-1)/2}.

Hence, there exist graphs on at most 2^{(k-1)/2} vertices which do not have either a clique or an independent set of size at least k.

-

Probabilistic Method – 2

Q.1. Let n \geq 2k be positive integers. Let \mathcal{C} be a collection of pairwise intersecting k-element subsets of [n] = \{1, 2, \ldots, n\}  (i.e. if A, B \in \mathcal{C}, then A \cap B \neq \phi). Prove that |C| \leq { {n-1} \choose {k-1}}.

Solution.

Let A be a k-element subset of [n] selected uniformly at random from the {n} \choose {k} possible such subsets.

Now, Pr[A \in \mathcal{C}] \leq k/n \Rightarrow |C| \leq { {n} \choose {k} } . k/n = { {n-1} \choose {k-1}}.

We will try to prove that Pr[ A \in \mathcal{C}] \leq k/n.

Let A be selected as follows. Take a random permutation \sigma \in \mathcal{S}_n. Take a random index 1 \leq i \leq n, and let A = \{i, \ldots, i + k-1\} (wrapped around n).

Now, if given a fixed \sigma, Pr[ A \in \mathcal{C}| \sigma] \leq k/n, then Pr[ A \in \mathcal{C}] \leq k/n.

Further, since there are at most k possible indices for i such that \{\sigma(i), \ldots, \sigma(i+k-1)\} (wrapping around n) can be a part of \mathcal{C} (due to the property of pairwise intersection), Pr[ A \in \mathcal{C} | \sigma ] \leq k/n.

--

 

Probabilistic Method – 1

Q.1. Let A_1, \ldots, A_n and B_1, \ldots, B_n be distinct subsets of \mathbb{N}, such that:

  • \forall i, A_i \cap B_i = \phi.
  • \forall distinct i,j (A_i \cap B_j) \cup (A_j \cap B_i) \neq \phi.

Prove that, for every real 0 \leq p \leq 1, \sum_i {p^{|A_i|} (1-p)^{|B_i|}} \leq 1.

Solution. Let us create a random subset R by sampling each element of \mathbb{N} with probability p.

Let E_i denote the event that R contains all elements of A_i and none of B_i. Pr [E_i] = p^{|A_i|} (1-p)^{|B_i|}.

Now, the events E_i are pairwise disjoint.

Therefore, \sum_i {p^{|A_i|} (1-p)^{|B_i|}} = Pr[ some E_i occurs] \leq 1.

--

Q.2. Let A_1, \ldots, A_n and B_1, \ldots, B_n be distinct subsets of \mathbb{N} such that:-

  • \forall i, A_i \cap B_i = \phi.
  • \forall distinct i, j, A_i \cap B_j \neq \phi.
  • \forall i, |A_i| =r, |B_i| = s.

Prove that n \leq {r+s} \choose {r}.

Solution.

Consider a random permutation of the universe X consisting of all elements in the sets A_i, B_i.

Let E_i denote the event that all elements of A_i occur before every element of B_i in the permutation.

Pr [ E_i] = { {r+s} \choose {r}}^{-1}.

Now, the events E_i are pairwise disjoint.

Therefore, n. {{r+s} \choose {r}}^{-1} \leq 1.

--

Probability review

Basic probability review

  • Sample space {\Omega} can be viewed as the set of results obtained from the probabilistic experiment under consideration. e.g. If the experiment consists of flipping a coin twice, then the sample space would consist of {\{HH, HT, TT, TH \}}. Elements of {\Omega} are termed elementary events. Subsets of {\Omega} are termed events. e.g. {\{HH, HT\}} is the event that the first coin flip results in a Head.
  • Definition A {\sigma-} field {(\Omega, \mathcal{F})} consists of a sample space {\Omega} and a set {\mathcal{F}} which is a collection of subsets of {\Omega}, and satisfies the following properties:
    • {\phi \in \mathcal{F}}
    • closure under complement {\mathcal{\epsilon} \in \mathcal{F} \Rightarrow \bar{\mathcal{\epsilon}} \in \mathcal{F}}
    • closure under union { \forall i \in I, } { \mathcal{\epsilon}_i \in \mathcal{F} \Rightarrow \cup_{i \in I} \mathcal{\epsilon}_i \in \mathcal{F}}.

    Note that closure under union and closure and complement imply closure under intersection (since {A \cap B = \overline{ \bar{A} \cup \bar{B} } }). Note also that {\Omega \in \mathcal{F}}. Elements of {\mathcal{F}} can be termed as observable events, and {\mathcal{F}} can be termed as an information structure.

  • Definition Given a {\sigma-} field {(\sigma, \mathcal{F}}, a probability measure {Pr : \mathcal {F} \rightarrow \mathbb{R}^{+}} is a function which satisfies:
    • For all {\mathcal{\epsilon} \in \mathcal{F}}, {0 \leq Pr [ \mathcal{\epsilon}] \leq 1}.
    • {Pr [ \Omega] = 1}.
    • For mutually disjoint events, {\mathcal{\epsilon}_1, \mathcal{\epsilon}_2, \ldots }, {Pr [ \cup_i \mathcal{\epsilon}_i ] = \sum_{i} Pr [ \mathcal{\epsilon}_i ]}.

    Some other properties that follow from the above:

    • {Pr [ \overline{\mathop{\mathbb E}\mathcal{\epsilon}}] = 1 - Pr [ \mathcal{\epsilon}] }
    • {Pr [ A \cup B ] = Pr [ A ] + Pr [ B] - Pr [ A \cap B]}.
  • Definition A probability space {(\sigma, \mathcal{F}, Pr)} consists of a {\sigma-} field {(\sigma, \mathcal{F})} together with a probability measure {Pr} defined on it.
  • Principle of inclusion-exclusion\displaystyle  Pr [ \cup_{i=1}^{n} \mathcal{\epsilon}_i ] = \sum_i Pr [\mathcal{\epsilon}_i] - \sum_{i < j} Pr [ \mathcal{\epsilon}_i \cap \mathcal{\epsilon}_j] + \sum_{i < j < k} Pr [ \mathcal{\epsilon}_i \cap \mathcal{\epsilon}_j \cap \mathcal{\epsilon}_k] -/+ \ldots  
  • Definition Condition probability of event {A} given event {B} is defined as :\displaystyle  Pr [ A | B ] = \frac { Pr [ A \cap B ] } { Pr [B] } .

    From this, we have that : { Pr [ X \cap Y ] = Pr [ Y ] Pr [ X | Y ] = Pr [X ] Pr [Y | X ]}.

     

  • If {\mathcal{\epsilon}_1, \ldots, \mathcal{\epsilon}_k} denote a partition of the sample space {\Omega}, then, for any event {\mathcal{\epsilon}}, :\displaystyle  Pr [ \mathcal{\epsilon} ] = \sum_{i=1}^{k} Pr [\mathcal{\epsilon}_i ] Pr [ \mathcal{\epsilon} | \mathcal{\epsilon}_i] 
  • Bayes’ rule If {\mathcal{\epsilon}_1, \ldots, \mathcal{\epsilon}_k} denote a partition of the sample space {\Omega}, then, for any event {\mathcal{\epsilon}}, :\displaystyle  Pr [ \mathcal{\epsilon}_i | \mathcal{\epsilon}] = \frac { Pr [ \mathcal{\epsilon}_i] Pr [ \mathcal{\epsilon} | \mathcal{\epsilon}_i ] } { \sum_{j=1}^{k} Pr [\mathcal{\epsilon}_j] Pr [\mathcal{\epsilon} | \mathcal{\epsilon}_j]} 
  • Definition A collection of events {\{ \mathcal{\epsilon}_i | i \ in I\} } is \emph {independent} if for all subsets { S \subseteq I }, :\displaystyle  Pr [\displaystyle \cap_{i \in S} \mathcal{\epsilon}_i ] = \displaystyle \prod_{ i \in S } Pr [ \mathcal{\epsilon}_i]  
  • Equivalent defintion A collection of events {\{ \mathcal{\epsilon}_i | i \ in I\} } is \emph {independent} if for all {j \in I} and for all subsets { S \subseteq I \setminus \{j\} },\displaystyle  Pr [ \mathcal{\epsilon}_j | \displaystyle \cap_{i \in S} \mathcal{\epsilon}_i ] = Pr [ \mathcal{\epsilon}_j]
  • A collection of events {\{ \mathcal{\epsilon}_i | i \ in I\} } is \emph {k-wise independent} if every set of {k} events in the collection is independent. 2-wise independence is called pairwise independence.
  • Definition A random variable {X} is a real-valued function over the sample space {X : \Omega \rightarrow \mathbb{R}} such that for all {x \in \mathbb{R}}:\displaystyle  \{ \omega \in \Omega | X(\omega) \leq x \} \in \mathcal{F}
  • The distribution function { F : \mathbb{R} \rightarrow [0,1]} for a random variable {X} is defined as { F_X (x) = Pr [ X \leq x]}.
  • The density function {p : \mathbb{R} \rightarrow [0,1]} is defined as {p_X (x) = Pr [X = x ]}.
  • The joint distribution function and joint density function are defined likewise :\displaystyle  F_{X,Y} (x,y) = Pr [ \{X \leq x\} \cap \{ Y \leq y \} ] \displaystyle  p_{X,Y} (x,y) = Pr [ \{X =x\} \cap \{ Y = y \} ]

    .

Infinity of primes

Q.1. Prove that there are infinitely many primes.

Solution.

Assume to the contrary a finite set of primes \{p_1, p_2, ..., p_k\}. \ldots (1).

Consider n = p_1 \times p_2 \times ... \times p_k + 1.

Now, p_i | n-1 \forall 1 \leq i \leq k .

\therefore n is not a multiple of any prime from the assumed finite set.

Hence, n satisfies the definition of a prime number (i.e. one which is not a multiple of primes strictly smaller than itself.)\ldots (2).

(1) and (2), implies contradiction.

 

Q.2. Prove that there are infinitely many primes of the form 4n -1.

Solution.

Claim. A number of the form 4n-1 always has a prime factor of the form 4n-1.

Proof of claim.

Consider any k = 4n-1.

If k is prime, we are done.

Else, k = k_1. k_2, where k_1 \equiv 1 (mod 4) and k_2 \equiv 3 (mod 4). (verify!)

If k_2 is prime, we are done. Else, proceed as above.

The process concludes (if it does not find a greater prime of the form 4n-1) at 3.

Hence, the claim is proved.

-

Assume there are only a finite number of primes of the form 4n-1. Their product is either of the form 4n-1 or 4n+1. Add either 4 or 2, as the case may be, to obtain the next number, call it p, of the form 4n-1.

Now, p is not a multiple of any of the primes from the assumed finite set, contradicting the statement of the claim.

-

 

 

More on the Dial-a-Ride problem

We present here a so-called Structure Theorem given by Gupta et al. [1] for the non-preemptive Dial-a-Ride problem on an n-vertex metric space (V,d) having m demand pairs, and a specified start vertex, r.

We say that a vehicle is in a pick-up phase during some portion (segment) of a tour, if it only performs pick-ups during that time interval. Similarly, we say that a vehicle is in a delivery phase during a particular segment if it only performs deliveries in that segment. If the optimal tour had a structure which allowed it to be split into pick-up and delivery phases, i.e. if the optimal tour could be split as follows: the vehicle first picks up a set of at most k items; it then delivers each of those items to their destinations; it again picks up a set of at most k items, and so on, – we would have been able to apply the greedy paradigm for the problem. We could select a best-possible set of (at most k) items to service during a segment.

However, the optimal tour may not exhibit such a structure. Gupta et al. show that a near-optimal tour displays this property.

Theorem 1. Given an instance of non-preemptive Dial-a-Ride, there exists a feasible tour, \tau within O(\log {m}) of optimal, such that \tau can be split into a set of segments – \tau = S_1. S_2 ... S_t , wherein each segment, S_i services a set, O_i of at most k demand pairs (items) by first picking up all items in O_i and then delivering them to their respective destinations.

Proof. Consider an optimal tour \sigma. Represent \sigma as a line L such that every edge in L corresponds to an edge traversal in \sigma. The i^{th} edge of L corresponds to the i^{th} edge traversed in \sigma. (That is, L is simply the tour \sigma, wherein vertices visited multiple times in \sigma are represented that many times in L.)

By the triangle inequality, we have that the number of edges in \sigma, denoted by | \sigma | (equal to the number of edges in L) is at most 2m + 1.

Let d(\sigma) denote the length of \sigma.

Define stretch of item i as the number of edges traversed in \sigma after it is picked up, and before it is delivered (i.e. the number of edges it travels on the vehicle).

Divide the m items into groups G_j, j = 1, ..., \lceil \log {(2m)} \rceil, such that G_j is the set of all items having stretch between 2^{j-1} and 2^j.

We shall service items in each group separately.

Consider group G_j. Denote by \tau_j the portion of the tour where items in G_j are serviced.

We first prove a lemma relating to \tau_j.

Lemma 2. There exists a tour \tau_j of length at most 6 d(\sigma), such that \tau_j can be divided into a set of segments, each servicing at most k items in a manner similar to that stated in Theorem 1.

Proof. Let r = 2^{j-1}. Divide L into segments having r edges each. There are \lceil \frac { |\sigma|} {r} such segments- the first r edges in L form the first segment (denoted by T_{1,j}), the next set of r edges forms segment T_{2,j}, and so on.

Denote the set of items originating in segment T_{l,j} by O_{l,j}. (i.e. Items in O_{l,j} have sources between vertices (l-1)r and lr -1. Note that items in O_{l,j} are exactly those items that cross edge (lr-1, lr) in \sigma. This implies that |O_{l,j}| \leq k. Note also that items in O_{l,j} have destinations between the vertices lr and (l+2)r - 1.

Consider set O_{l,j}.

We service it as follows: Start the vehicle (initially empty) at vertex (l-1)r. Pick up items in O_{l,j} originating in T_{l,j}. Deliver these items (\leq k) to their destinations in T_{l+1,j} and T_{l+2,j}. Return with the empty vehicle to vertex lr. Denote by \omega_{l,j} this portion of the tour where items in O_{l,j} are serviced.

By concatenating the portions \omega_{l,j} for possible values of l, we obtain a tour, \tau_j that services all items in G_j.

We make the following observations:

  • Each edge in L is a part of at most 3 portions/segments: \omega_{l,j}.
  • Each edge is traversed at most twice in any segment/portion.
  • The number of items carried by the vehicle at any time in any \omega_{l,j} is at most k.

The above observations make it clear that \tau_j has length at most 6 d(\sigma) proving Lemma 2.

We service items in each of the \lceil \log {(2m)} \rceil groups, G_j similarly. We thus get a tour that is O(\log {m}) d(\sigma), completing the proof of Theorem 1.

References:

[1] A. Gupta et al – Dial-a-Ride from k-forest, 2010.

On the Dial-a-Ride problem – 3

We present here a polynomial-time 2-approximation algorithm given by Charikar and Raghavachari [CR 2001] for the Capacitated Preemptive version of Dial-a-Ride on trees. For ease of visualization, we consider rooted trees, though the algorithm works as such for any tree.

The items are assumed to be located on the leaves of the tree, T. Consider a demand pair \{s_i, t_i\}. We will call the least common ancestor of s_i and t_i (the least-depth vertex in the unique path between s_i and t_i) as the turning point for that demand pair. Given the m demand pairs, all turning points can be easily computed in polynomial time. The algorithm that follows makes use of these pre-computed turning points.

The algorithm modifies the usual depth-first search, and runs two modified traversals in sequence. In the first traversal, called the upsweep, all items are shifted upwards from their sources to their respective turning points. In the second traversal, called the downsweep, the items are shifted downwards to their respective destinations.

Consider any edge e.

  • In the upsweep phase, the algorithm recursively explores the sub-tree rooted at e at the end of which all items that must necessarily pass upwards through e are shifted to the lower end point of e. The total number of such items is flow_u(e). The items are then shifted upwards across e. Clearly, edge e is traversed 2 max \{ 1, \lceil \frac{flow_u(e)}{k} \rceil \} times.
  • In the downsweep phase, items that need to pass downwards across e are shifted to the upper end point of e before the subtree rooted at e is recursively explored. Again, it is clear that e is traversed 2 max \{1, \lceil \frac{flow_d(e)} {k} \rceil \} times in the downsweep.

Notes:

  • Preemption is used in carrying items across edge e in the above description. The vehicle travels back and forth across e carrying items to the other end point.
  • Clearly, the algorithm runs in polynomial time.
  • This is a 2-approximation algorithm. This is obtained directly from the Steiner tree, and the flow lower bounds.

References:

  • [CR 2001] M. Charikar, B. Raghavachari: The Finite Capacity Dial-a-Ride Problem, 2001.

On the Dial-a-Ride problem – 2

We consider here the capacitated Dial-a-Ride problem on the tree metric. The m demand pairs \{(s_i, t_i)\}_{i=1}^{m} are assumed to be on the leaves of the tree, T.

  • For each demand pair (s_i, t_i), we have a unique path p_i from s_i to t_i in T.
  • Call all vertices which are on any of the paths \{p_i\}_{i=1}^{m} as interesting.
  • Call all edges which are on any of the paths \{p_i\}_{i=1}^{m} as essential.
  • Clearly, any tour that satisfies all demand pairs must cover all essential edges, and hence, all interesting vertices.

We present two lower bounds on the number of edge traversals in the optimal tour.

1. Steiner tree lower bound:- Let Q be the least cost Steiner tree spanning all interesting vertices. Q would include all essential edges, and if \cup_{i=1}^{m} p_i is not connected, Q will include other edges too. Clearly, any tour that satisfies all demand pairs must traverse every edge of Q at least twice. We call this the Steiner tree lower bound.

2. Flow lower bound:- With every edge e in T, we associate two quantities flow_u(e) and flow_d(e). For every edge e = \{u,v\}, fix, for the purpose of computation of flow_u(e) and flow_d(e), an ordering of the end points. flow_u(e) equals the number of paths, p_i which require an item to pass from vertex u to vertex v. flow_d(e) corresponds to the number of paths, p_i which require an item to pass from vertex v to vertex u. (Here, ‘u’ denotes ‘upwards’, and ‘d’ denotes ‘downwards’.)

Clearly, for a vehicle of capacity k, edge e must be traversed at least 2 max \{ \lceil \frac{flow_u(e)}{k}\rceil, \lceil \frac{flow_d(e)}{k}\rceil \} in any feasible Dial-a-Ride tour. We call this the flow lower bound.

Notes:

  • The flow lower bound provides a positive lower bound only for essential edges. The Steiner tree lower bound provides a positive constant lower bound of 2 for all essential edges, and, in case \cup_{i=1}^{m} p_i is disconnected, for certain non-essential edges.
  • The flow lower bound is always at least as strong as the Steiner tree lower bound for essential edges.
  • The two bounds can be combined in the obvious manner to give a stronger lower bound.
  • For infinite or arbitrarily high capacity k, the flow lower bound translates to a lower bound of 2 for all essential edges and zero for non-essential edges. The Steiner tree bound remains 2 for all edges in Q.
  • For k=1, the flow bound translates expectedly to 2 max\{flow_u(e), flow_d(e)\}. Intuitively, it is clear that as the capacity of the vehicle decreases, the Steiner tree lower bound, which does not take this into account, becomes progressively weaker as compared to the flow bound for essential edges.
  • Intuitively, the flow bound is concerned with the capacity of the vehicle for the flow of items across essential edges. The Steiner tree bound is concerned with the connectivity requirement in order to complete the tour using a least cost superset of essential edges.

References:

M. Charikar, B. Raghavachari: The Finite Capacity Dial-a-Ride Problem, 2001.

On the Dial-a-Ride problem

Dial-a-Ride problem: Given an undirected graph G = (V,E) with a non-negative length function l : E \rightarrow \mathbb{R}^{+}, a set of m ordered source-destination pairs (demand pairs) \{(s_i, t_i)\}_{i=1}^{m} \subset V \times V (each requiring an item to be carried from s_i to t_i), and a vehicle with capacity k, find a minimum length tour which satisfies all demand pairs such that the number of items carried by the vehicle during any point in the tour does not exceed k.

  • The problem definition may or may not specify a particular starting vertex for the tour.
  • Dial-a-Ride is a restricted version of k-delivery TSP. k-delivery TSP allows an item from any particular source to be carried to any of the specified destinations, and hence generalizes Dial-a-Ride.
  • Dial-a-Ride generalizes TSP, and hence is NP-hard. Consider a TSP instance, G=(V,E). Mark any one of the vertices in V as the source vertex, and mark the remaining vertices |V|-1 vertices as destination vertices, to obtain |V|-1 demand pairs. Set the capacity of the Dial-a-Ride tour vehicle to be arbitrarily high (infinite), and set the starting vertex as the one selected to be the source. A Dial-a-Ride tour on the reduced instance corresponds to an optimal TSP tour on the original instance.
  • There are two versions of the Dial-a-Ride problem. The preemptive version allows an item to be dropped off by the vehicle at an intermediate location (different from its destination). The non-preemptive version is more restrictive and does not allow this freedom.
  • Metric version: Here the underlying graph is an n-vertex metric space (V,d).
  • Dial-a-Ride on the tree metric: The source and destination points are assumed to be on the leaves. Apart from this restriction, the problem statement remains the same as before.

On the TSP and related problems – 2

3. k-MST: (general (as opposed to metric) version) :- Given an undirected graph G= (V,E), with non-negative edge lengths l : E \rightarrow \mathbb{R}^{+}, find a tree of minimum length that spans at least k vertices.

Notes on the k-MST problem:

  • There are two versions of the k-MST problem. The rooted version specifies a root vertex, r, which must necessarily be included in the tree. The unrooted version does not place any such restriction.
  • An \alpha-approximation algorithm for unrooted k-MST gives an \alpha-approximation algorithm for rooted k-MST. Consider an instance of rooted k-MST. Add |V| additional vertices to G, and an edge of zero length between the root and each of them. Clearly, an optimal solution of the unrooted version requiring the tree to span at least |V| + k vertices in the reduced instance, is an optimal solution for the rooted version for k vertices. Hence, an \alpha-approximation algorithm for unrooted k-MST implies an \alpha-approximation algorithm for rooted k-MST.
  • An \alpha-approximation algorithm for rooted k-MST gives an \alpha-approximation algorithm for unrooted k-MST. Consider an instance of unrooted k-MST. Run the \alpha-approximation algorithm for rooted k-MST on all vertices. The least cost solution among these gives an \alpha-approximate solution to the unrooted version. ( The least cost optimal solution over all vertices (as roots) for the rooted version is an optimal solution for the unrooted version. Hence, the least cost \alpha-approximate solution over all vertices for the rooted version gives a solution that is within \alpha-factor of the optimal for the unrooted version.)
  • Budget version of k-MST: Given an undirected graph G= (V,E), a non-negative length function l : E \rightarrow \mathbb{R}^{+}, and a budget B, find a tree that spans the maximum number of vertices subject to the condition that the length of the tree is at most the upper bound B.
  • Garg [2005] gave a 2-approximation algorithm for k-MST.
  • Metric version of k-MST: Given an n-vertex metric space, (V,d), and a number k, find a minimum length tree that spans at least k vertices.

4. k-forest: Given an undirected graph G= (V,E) with non-negative length function l : E \rightarrow \mathbb{R}^{+}, a set of m demand pairs \{s_i, t_i\}_{i=1}^{m} \subset V \times V, and a number k, find a least cost subgraph that connects (satisfies) at least k demand pairs.

  • Rooted k-MST reduces to k-forest. (k-forest generalizes rooted k-MST.) Consider a rooted k-MST instance on G= (V,E) with root r. Fix |V| - 1 demand pairs \{r,v\} where v \in V - \{r\}. An optimal solution satisfying k-1 demand pairs in this reduced instance is also an optimal solution for rooted k-MST on the original instance. Also, clearly, an \alpha-approximation algorithm for k-forest gives an \alpha-approximation algorithm for rooted k-MST.
  • Unrooted k-MST reduces in polynomial time to k-forest. This follows from the reductions from the unrooted k-MST to rooted k-MST, and from rooted k-MST to k-forest. Again, an \alpha-approximation algorithm for k-forest gives an \alpha-approximation algorithm for unrooted k-MST.
  • Metric k-forest: Given an n-vertex metric space (V,d), a set of m demand pairs \{s_i, t_i\}_{i=1}^{m} \subset V \times V, and a number k, find a least cost subgraph satisfying at least k demand pairs.
  • Metric versions of rooted and unrooted k-MST reduce in polynomial time to metric k-forest.

References:

  • Naveen Garg: Saving an epsilon: A 2-approximation for the k-MST problem in graphs, 2005.

Notes on the TSP and related problems

Here are the problem definitions and some notes on the metric versions of the Traveling Salesman Problem and related problems.

A metric space is defined as an ordered pair (M,d), where M is a set and d: M \times M \rightarrow \mathbb{R} is a distance function satisfying the following properties:

  • d(x,y) \geq 0 \forall x, y \in M.
  • d(x,y) = 0 iff x = y.
  • d(x,y) = d(y,x).
  • d(x,y) + d(y,z) \geq d(x,z).

1. TSP: Given a n-vertex metric space (V,d) and m points in V, find a minimum length tour that covers all m points.

Notes on the TSP:

  • No point that is not one of the specified m points will be visited by the optimal TSP tour. This follows directly from the triangle inequality (visiting a point, v that is not one of the m specified points, can be short cut by bypassing v to obtain a tour that is shorter than one that visits v).
  • The optimal tour is a cycle, i.e. no point is visited more than once. This, again, follows from the triangle inequality.
  • TSP is NP-hard.

2. k-delivery TSP: Given an n-vertex metric space (V,d), a Source set S \subset V, a Destination set T \subset V, and a number k, such that S \cap T = \phi, \sum_{s \in S} x_s = m and \sum_{t \in T} y_t = m, where x_s represents the number of items available (i.e. capacity) at source point s, and y_t represents the requirement at destination point, t, find a minimum length tour of a vehicle that transports all m items from source points to the destination points such that the vehicle carries at most k items at all times during the tour.

Notes on the k-delivery TSP:

  • Implicit in the problem statement are the assumptions that all source points in S are identical, and that all destination points in T are identical, and that all items are identical. In other words, any item can be transported from any source to any destination.
  • The problem can require the tour to start at a particular point. Intuitively, it seems that, unlike in the case of metric TSP, having a fixed starting point can result in a longer tour than what may be obtained given the freedom to choose any starting point.
  • There are two versions of this problem. The preemptive version allows the vehicle to drop off an item at an intermediate point that is not a destination point. The non-preemptive version does not allow this freedom.
  • For the non-preemptive version of the problem, the optimal tour would involve only the points in S \cup T. This follows from the triangle inequality.
  • k-delivery TSP generalizes TSP and hence is NP-hard.
    Consider a TSP instance requiring a least cost tour covering m specified points. Choose an arbitrary point, say r,  among the specified points, designate it as a source point with capacity m-1, and designate the remaining m-1 points as destinations with unit requirement. An optimal k-delivery TSP tour for this instance corresponds to an optimal TSP tour.
  • An application of k-delivery TSP: Suppose that Coke has a certain number of storehouses in a particular town, from where it needs to transport cartons of bottles to stores across the town. A carton can go from any storehouse to any store, and there is only one vehicle for carrying the cartons, and it can carry at most a certain number of cartons at any time.

Multiway cut – 2

In the previous post on this topic, we saw an algorithm with an approximation guarantee of 2 for the multiway cut problem. Some guidelines were also given on developing a factor (2-2/k) approximation algorithm based on the ideas of the original algorithm.

Let us briefly review that.

Problem: Given a set, S = {s_1, s_2, ..., s_k} of k vertices, find a cut of minimum size (in terms of weights of edges) that separates each s_i from all other vertices in S.

Algorithm:

1. Find an isolating cut for each s_i . Call each such isolating cut C_i .

2. Output the union of the k-1 lightest isolating cuts. (i.e. exclude the heaviest of the C_i's and output the union of the rest.)

The above set of edges is a multiway cut.

We saw how the set of edges C_1 \cup C_2 \cup ... \cup C_k formed a 2 – factor approximation to the optimal cut. Using this fact, convince yourself that the above algorithm outputs a (2-2/k) factor approximation of the optimum.

Another multiway cut algorithm:

We now present a mulitway cut algorithm that performs no worse than the above (2-2/k) factor algorithm, and indeed performs better than the above in examples which we will see.

The algorithm inherits the idea of finding an isolating cut for each vertex in the set, S. Then, it performs a greedy choice, and picks the isolating cut, C_i having minimum size. It then removes the edges of cut, C_i from the graph, G to get a new graph G’. The vertex s_i is removed from the set, S. The same procedure is repeated on this new graph. (i.e. the isolating cuts are again computed for each of the k-1 vertices left in S, and the minimum cut is selected for removal from the graph.)

Once the above procedure is repeated k-1 times, we get the required multiway cut, which has an approximation guarantee of (2-2/k), and which additionally performs better than the above (2-2/k) factor algorithm in specific test cases.

The algorithm is repeated below step-wise.

1. Let G_1 = G. and S_1 = S.

Initially, i = 1. The required multiway cut, C is initially the Null Set.

Repeat the following (Steps 2-7) k-1 times.

2. Compute isolating cuts (wrt vertices in S_i ) – C_i for each vertex in S_i.

3. Let C_j be the minimum of the cuts computed in the previous step.

4. Remove the edges of C_j from G_i to get a new graph, G_{i+1} .

5. Add the edges of C_j to C. i.e. C = C \cup C_j.

6. Remove vertex, s_j from S_i to get a new set, S_{i+1}.

7. Increment i by one.

End of algorithm.

The set, C is the required multiway cut.

—–

Things to think about:

1. Convince yourself that k-1 iterations are sufficient to generate a multiway cut. (i.e. after k-1 iterations, each of the vertices in set, S is disconnected from all other vertices in S.)

2. Can you see why this algorithm will generate a cut that is no worse than the cut produced by the earlier (2-2/k) factor algorithm.

At each step , we remove a particular vertex from the set, S, and recompute the values of the isolating cuts for the remaining vertices, based on this new S. Can you see why this could lead to a better isolating cut for a particular vertex in S than was possible under the earlier algorithm.

Let us consider an example:

multiway_alg2

Example for multiway cut: the edge weights are indicated alongside the edges.

In the above example:

The minimum isolating cut for s_3 has value = 20. (One such candidate for C_3 is the set of edges { {s2,s3), {s3,C}}.
The minimum isolating cut for s_1 has value = 15. There can be more than one such cut with value = 15. Let the choose this one: {{A,B}, {A,C}}.
Similarly, value of minimum isolating cut for s_2 is 15; and one such cut is given by: {{s2,B}, {s2,s3}}.

Now, how would the first algorithm proceed?

The union of the 2 lightest isolating cuts is outputted. Hence, the multiway cut produced by the algorithm is: {{A,B}, {A,C}, {s2,B}, {s2,s3}}. The value of this cut is 30.

How does the 2nd algorithm perform in this example?

Take a minimum value cut from the set of isolating cuts and add it to C. We can add either C_1 or C_2 since both have equal minimum value. Suppose, we add C_1 to C. Now, C = {{A,B}, {A,C}}.
Remove edges of C_1 from G, and recompute minimum isolating cuts for s_2 and s_3 on the new set, S = {s2, s3}.
We get a minimum isolating cut of value = 10 for both s_2 and s_3 . This is given by the edge {s2,s3}.
Add this edge to C, and we are done.

Therefore, C = {{A,B}, {A,C}, {s2,s3}}.

Value of this multiway cut = 25.

Hence, we can see how the 2nd algorithm performs better than the first in this example.

On a separate note, convince yourself that the second algorithm will perform no worse than the first on any input graph.

———

Another example (here the second performs asymptotically twice as better as the first):

Example for multiway cut

Example for multiway cut

Note: In the above diagram, the letter e has been used to denote \epsilon > 0 .

The set, S consists of k vertices, s_1 through s_k as shown. (The diagram is for k=5)

Performance of first algorithm on above instance:
Value of C_1 = 2 - 2 \epsilon .
For all other vertices in S, value of C_i = 2 - \epsilon.

Therefore, the value of cut produced by algorithm = (k-2) \times (2-\epsilon) + (2 - 2 \epsilon).

Performance of second algorithm on above instance:
Value of minimum C_i is 2- 2 \epsilon given by C_1 . On removing C_1 from G and s_1 from S, we find that the values of all other isolating cuts become equal to one. (See the diagram.) We proceed through the algorithm as specified to get cut, C.

Value of cut, C thus produced = (k-2) \times 1 + (2 - 2 \epsilon) .

Verify that for large values of k, as \epsilon —> 0, the ratio of values of the two cuts approaches 2.

Hence, the second algorithm performs twice as better asymptotically in this case.

3. Consider this example:

A tight example for both algorithms.

A tight example for both algorithms.

(Note: The letter, e has been used in the diagram to represent, \epsilon >0 .

Verify for yourself the following:

1. The optimal multiway cut for the above instance is of value = k. (Consider removing all edges of the cycle.)

2. The first algorithm produces a cut of value = (k-1) \times (2 - \epsilon) .

3. The second algorithm produces a cut of value = (k-1) \times (2 - \epsilon).

4. Asymptotically, both algorithms achieve an approximation factor of (2-2/k).

(I’m sorry for any errors in this post.)

Multiway Cut

A cut, C in a graph, G= (V, E) is a set of edges, whose removal from G disconnects G. i.e. G’ = (V, E – C) is disconnected. (A graph is said to be connected if there exists a path between any two vertices in the graph. Hence, a graph is said to be disconnected if there exist two vertices in the graph, say u and v, such that there exists no path between u and v in the graph.)

An s-t cut, C is a set of edges whose removal from the graph, disconnects s and t.

(Note: Throughout this discussion, we shall talk only about undirected, edge-weighted, connected graphs. Also, all edge weights are integers.)

Size of a cut: This means the total weight of all edges forming the cut.

We want to disconnect the graph; but we want to do so by removing as few (in terms of weight) edges as possible. That, we wish to find a cut of minimum size. Such a cut is called a min – cut.

Example for min – cut:

Min Cut example: All edges in the graph have weight = 20, other than the 3 highlighted edges. Those 3 edges form the min cut.

Min Cut example: All edges in the graph have weight = 20, other than the 3 highlighted edges. Those 3 edges form the min cut.

An s-t min cut, as is clear from the name, is a cut of minimum size that separates vertex s from vertex t.

Example for s- t min cut:

Example for s-t min cut

Example for s-t min cut

(The highlighted edges form an s-t min cut of value = 40. Note that the min cut value for this graph is 5 (obtained by removing edge {a,b}.)

=========

A multiway cut is a generalization of an s-t cut.

Here, instead of two vertices, s and t, we have to separate a set of k vertices, {s_1, s_2, ..., s_k} from each other.

In other words, find a set of edges, say C, such that on removing C from the graph, there exists no path between s_i and s_j for any i, j \in {1,2, ..., k} .

Our task is to find a multiway cut of minimum size.

Example of multiway cut:

By removing all edges shown in the figure, we can separate vertices, s1, s2, .., sk, from one another.

By removing all edges shown in the figure, we can separate vertices, s1, s2, .., sk, from one another. (Note that within each ellipse around each si, there may be a number of vertices and edges between those vertices. They have not been shown for sake of clarity.)

(The above example shows a multiway cut.)

========

Question: How do you find a multiway cut of minimum size?

It turns out that it is hard to find such a cut in polynomial time. But that’s not the end of the story. What we can do efficiently (in general, an algorithm is efficient if it runs in polynomial time) is to find a multiway cut that is within a factor of 2 from the optimal. In other words, if the value of the optimal multiway cut of a graph, G is OPT (G) , our algorithm will produce a cut of value \leq 2 \times OPT (G) for every graph, G.

(So, even though we don’t find an optimal cut, we find one that is guaranteed to be “close” to optimal.)

========

Even though, we do not know of any efficient algorithm, for solving a multiway cut problem exactly for arbitrary values of k, we do know how to solve it efficiently for k = 2. (Note that for k = 2, the multiway cut problem is just the s-t cut problem. An optimal s-t cut can be found efficiently. [It is actually quite simple to do it; but that is not our concern here.] )

We will use this efficient s-t min cut algorithm to find our approximate multiway cut.

————–

Let us review once again what we want. Let us denote by S the set of k vertices, {s_1, s_2, ..., s_k} .

We want to remove a set of edges, such that on their removal:
- there exists no path from s_1 to s_2 , s_1 to s_3 , and so on till s_1 to s_k .
- there exists no path from s_2 to s_1 , s_2 to s_3 , and so on till s_2 to s_k .
- there exists no path from s_3 to any of the other vertices in S.
- and so on, for each vertex in the set, S.

The above explanation of the problem actually provides us with a straightforward algorithm for the same.

Take vertex, s_1 . Find a set of edges, removing which would disconnect s_1 from every other vertex in S.
Let C_1 be such a set of minimum size (i.e. minimum total weight).

How do we find C_1? Simple; we want a min cut that separates s_1 from all other vertices of S. What we’ll do is merge all these other vertices of S into one single vertex, call it t. Find an s-t min cut with s_1 as the vertex, s, and the merged vertex as t.

(Merging: When we merge any two or more vertices, the edges between these merged vertices disappear. The other edges that were incident upon these vertices, remain as such, and are now incident upon the new merged vertex.

Example for merging vertices:

The graph on the right is obtained by merging vertices, C, D, E and F into one "super-vertex" denoted by {C,D,E,F}. Note that there are multiple edges between the "super-vertex" and some other vertices of the graph.

The graph on the right is obtained by merging vertices, C, D, E and F into one "super-vertex" denoted by {C,D,E,F}. Note that there are multiple edges between the "super-vertex" and some other vertices of the graph.

(Each of the edges {F,G} and {E,G} translates into an edge between the super-vertex and G. Hence, there are 2 edges between {C,D,E,F} and G.)

Therefore, using the efficient algorithm for finding an s-t min cut, we can compute C_1 . On removing the edges of C_1 , vertex s_1 will be disconnected from all other vertices in S.

So, the first part of our job is done (disconnecting s_1 from all others in S.)

(Note: We call C_1 as an “isolating cut” for vertex, s_1 .)

Similarly, find isolating cut C_2 corresponding to vertex s_2 . On removing the edges of C_2 , s_2 is disconnected from every other vertex of S.

Therefore, by removing the union of C_1 and C_2 , we can disconnect s_1 and s_2 from every other vertex in S (and obviously, from each other also).

Now, you get an idea of how to proceed, right?

Find an isolating cut C_i corresponding to each vertex, s_i in the set, S.

The union of all these isolating cuts, i.e. C_1 \cup C_2 \cup ... \cup C_k is a multiway cut separating s_1, s_2, ..., s_k from one another.

=========

Question: We had claimed that the cut produced by the algorithm is always within a multiplicative factor of 2 within the optimal. How do we prove this?

Let us call the cut produced by the above algorithm as C. i.e. C = C_1 \cup C_2 \cup ... \cup C_k .

Let us call the optimal cut as A.

Now, on removing the edges of A from G, we would get k (disjoint) connected components in the graph, one connected component for each vertex, s_i .

Let A_1 be the subset of edges from A, that is incident upon the connected component of s_1 .

Graph showing A1. (A1 is the set of dashed edges.) (Note that within the connected components of vertices other than s1 have not been shown. Also, the edges between the various connected components other than the one containing s1 have not been shown, for sake of clarity.)

Graph showing A1. (A1 is the set of dashed edges.) (Note that vertices and edges within the connected components of vertices other than s1 have not been shown. Also, the edges between the various connected components other than the one containing s1 have not been shown, for sake of clarity.)

As is quite obvious, A = A_1 \cup A_2 \cup ... \cup A_k .

It is also clear that each edge of A appears in exactly two A_i 's . (If an edge of A is between the connected component of vertex s_i and s_j , then it appears in the sets A_i and A_j . )

From this, we have that:

\sum_{i=1}^{k} w(A_i) = 2 \times w(A) .  —————————Eqn. (1)

Here, w(X) denotes the total weight of edges in set, X.

Now, we know that C_1 is an isolating cut of minimum size for s_1 .

Therefore, w(C_1) \leq w(A_1) .

Similarly, w(C_i) \leq w(A_i) for all i \in {1, 2, ..., k} .

Therefore, \sum_{i=1}^{k} C_i \leq \sum_{i=1}^{k} A_i .

Combining the above equation with Eqn. (1), we get that:-

\sum_{i=1}^{k} C_i \leq 2 \times w(A) .

i.e. the cut produced by our algorithm is always within a factor of 2 of the optimal multiway cut.

In other words, we are done. :)


Points to ponder:

1. Do we necessarily need to take the union of all k sets C_1 till C_k to obtain a multiway cut? What if we take the union of only k-1 of these sets? Will that union be a valid multiway cut?

2. As an extension of the above point: which k-1 sets should we choose in order to minimize the size of the cut produced? (Hint: you can discard the heaviest among the C_i 's and output the union of the rest.) We know that outputting the union of all k sets results in a factor-2 approximation algorithm. What is the approximation factor for this improved algorithm? (Hint: Prove that the approximation factor is (2 – 2/k). )

(The topic of multiway cut is dealt in Chap. 4 of “Approximation Algorithms” by Vijay Vazirani.)

Max-Cut Problem

The Maximum Cut Problem asks us to find an edge-cut of maximum cardinality. (We are considering undirected, unweighted graphs.) In other words, we are required to find a partition of the vertex set, V into (X^*, V-X^*) such that the number of edges crossing the partition is maximized.

The algorithm presented here is based on a technique known as local search.

Algorithm:

Take any arbitrary partition of the vertex set, (X, V-X). Find a vertex which when flipped to the other set of the partition would have more of its incident edges crossing the cut.

Flip the vertex to the other set of the partition. Continue searching for such a vertex in this new partition. Continue this procedure generating new partitions until no such vertex satisfying the above condition exists.

—-

Consider this example:

graph4

We are given an arbitrary cut as shown above. Consider vertex, 1. Out of its 3 incident edges, 2 are incident on vertices of S1 (its current set), and one is incident on a vertex of set S2. Hence, by moving vertex, 1 to set, S2, we can improve the cardinality of the cut by one.

This is what we get by moving vertex,1 :-

Cut improved by flipping vertex,1
Cut improved by flipping vertex,1

Observations on the algorithm:

1. Each “flip” of a vertex from one set of the partition to the other increases the size of the cut by at least one.  (If vertex, 1 is incident on x_1 vertices in set S_1 , and x_2 vertices in set S_2 , (and x_1 > x_2 ) then flipping vertex, 1 from S_1 to S_2 would increase the size of the cut by x_1 - x_2. )

2. The above point shows why the algorithm runs in polynomial time. The maximum possible size of a cut of G=(V,E) is |E|. From Observation 1, we can conclude that the algorithm makes at most |E| flips. Further, each flip can be done in polynomial time. Hence, the algorithm is of polynomial time complexity.

3. What is the minimum number of edges that the algorithm returns in its cut? In other words, give a lower bound on the size of the cut returned by the algorithm.

This is simple to compute.

Let (X, V-X) be the cut returned after the termination of the algorithm. (Termination of the algorithm implies that flipping any vertex from one set of the partition to the other would not produce a better cut.)

Consider any vertex, v in this cut. At least half of the edges incident on vertex, v in graph G would be crossing (X, V-X). (since otherwise it would mean that flipping v would improve the cut returned by the algorithm, indicating that the algorithm has not yet terminated i.e. contradiction.)

Therefore, at least half of all edges of G cross the cut returned by the algorithm.

|C| \geq |E|/2

where |C| represents the size of the cut returned by the algorithm.

4. Consider the following example:

For the above graph, let an arbitrary cut be given by:

Arbitrary cut used as starting point for algorithm
Arbitrary cut used as starting point for algorithm

(Edges within sets S1 and S2 have been omitted for clarity.)

If we start our algorithm with this cut as a starting point, we find that we can find no vertex which is suitable for flipping.  ( Consider vertex, a: Flipping it would make it incident  on n/2 vertices of S1. But it is already incident on n/2 vertices on S2. Flipping it will not improve the cut. Same is the case with vertex, b. Similarly flipping any of 1,2,..,n would improve the cut.)

Hence, this is the best cut that the algorithm can produce when provided with this particular cut as a starting point.

Size of cut produced by algorithm = n

But this is certainly not a maximum cut for the graph. Consider the following cut :-

A Max cut for the graph
A Max cut for the graph

Size of above cut = 2n.

This is clearly the maximum cut possible for the graph, since |E| itself is 2n.

Hence, our algorithm produces a cut that is factor of 1/2 of the optimal in this particular instance.

( Please note that the output of the algorithm for a particular input graph varies according to the arbitrary cut that is chosen as its starting point. )

5. The best possible cut can be of size |E|.

From Observation 3, it is known that the algorithm produces a cut of size at least |E| / 2.

Hence, the algorithm is a factor- 1/2 approximation algorithm. (i.e. it produces a cut that is always within a factor of 1/2 of the optimal)

Observation 4 indicates that the factor of 1/2 is indeed a tight bound on the performance of the algorithm through an example where the algorithm performs no better than than a factor of 1/2. (Such examples are known as “tight examples“).

On the Shortest Path and the MST Problems

This is a pre-submission copy of my undergraduate thesis “On the Shortest Path and Minimum Spanning Tree Problems”.

On the Shortest Path and Minimum Spanning Tree problems

This is more like a restricted survey on some algorithms for computing shortest paths and MSTs, and is quite elementary.

Vertex cover Problem

Here we’ll see a simple algorithm that finds a near-optimal vertex cover of a graph.

Firstly, what is a vertex cover? A vertex cover is a set of vertices, call it S, such that the following condition holds: every edge of the graph has at least one end-point in S.

( i.e. we are “covering” all the edges of the graph through the vertex cover.)

Example: If vertex, x is adjacent to vertices, a, b, and c, then it “covers” edges {x,a}, {x,b} and  {x,c}.

What we ideally want is a vertex cover having the minimum number of vertices, i.e. we want to “cover” all the edges of the graph using as few vertices as possible.

That would an optimal vertex cover. So, what is meant by “near-optimal”? This means that the vertex cover we find is not exactly one of the minimum possible cardinality. However, it also means that this vertex is close in size, say by a constant multiplicative factor, to the optimal one.

Why should we settle for a solution that is not as good as optimal? The reason is simple: some questions are inherently difficult to solve exactly. Also, for some problems, a near-optimal solution is good enough.

(Note that we only consider graphs that do not have any isolated vertices.)

We digress here a bit to introduce another concept.

Consider a graph, G. Pick an edge of G. Now, pick another edge of G, that is not adjacent to the edge picked earlier (i.e. the two edges do not have any common end-points). These two edges are said to be disjoint.

A matching in a graph is a set of disjoint edges.

What we would like to see is how many edges we can possibly have in a matching.  A maximum matching is a matching having the maximum number of edges that any matching of that particular graph can possibly have. (Note that we say “a” maximum matching, instead of “the” maximum  matching, since a graph can have multiple maximum matchings.)

The vertex cover algorithm is based on two simple ideas.

Idea 1If we take the end-vertices of all edges forming a maximum matching of a graph, G, then this set of vertices, say S, forms a vertex cover of G.

This is simple to prove. Assume there is an edge {x,y} which is not “covered” by S, i.e. neither x nor y is in S. In that case, we can add edge {x,y} to get a new matching. This matching has a size greater than that of our maximum matching, which is not possible. Hence, all edges of G are “covered” by S.

As stated earlier, we are interested in finding a vertex cover having minimum number of vertices. How does set S fare on this account?

Idea 2: The size of a maximum matching of G is a lower bound on the size of a vertex cover of G.

i.e. The best possible vertex cover of G (one with least possible number of vertices) will have at least ‘k’ vertices, where ‘k’ is the number of edges in a maximum matching of G. ( k = |S| / 2, in our  terminology.)

This is also easily proved.  Let’s call the vertex cover as T. Now, every edge of the maximum matching, call it M, must be “covered” by T. So, every edge of M must have at least one end-vertex in T. Also, there is no overlap whatsoever among the end-vertices of any two edges of M. Therefore, T must contain at least |M| vertices.

We can’t obtain a vertex cover with lesser number of vertices.

“Idea 1″ showed how to obtain a vertex cover of size, |S| = 2 |M|.  And “Idea 2″  showed that a minimum vertex cover needs to have at least |M| vertices.

So, what we obtained through “Idea 1″ is a vertex cover that is within a factor of two from the optimal. i.e. The number of vertices it contains is no more than twice the number of vertices contained in an optimal vertex cover.

Such an algorithm is known as a factor-2 approximation algorithm.

Notes:

1.  There exist algorithms that can find a maximum matching in polynomial time. (In algorithmic terms, a polynomial running time is considered to be “efficient”.) On the other hand, we do not know of any such efficient algorithm for obtaining an optimal vertex cover. Hence, we use the matching problem to develop an efficient approximation algorithm.

2. Suppose a network of roads in a town is modeled as a graph, with the streets representing edges, and intersections between streets being the vertices. We are asked to place policemen at the street intersections such that all streets are monitored by at least one policeman. ( A policeman can monitor all streets that begin/end at his intersection.) We want to use as few policemen as possible. This is the same as the optimal vertex cover problem.

Probabilistic Method

This is a method by which we can prove that a combinatorial object exists with non-zero probability. The interesting thing is that we need not even find any combinatorial object in order to prove its existence.

We’ll consider 2 problems here: the max-SAT and the max-cut problem.

(1) Max-cut: Find a cut in the graph with the maximum number of edges crossing the cut.

This problem is NP-hard. But we can accomplish another thing easily. We can prove that there exists a cut with at least m/2 edges crossing it. (m and n are the edges and vertices in the graph, respectively).

How do we prove that? Simple. Use the probabilistic method.

Assign each vertex of the graph, G independently and equiprobably to one of two vertex subsets. Now, consider any edge (u,v) belonging to G. What is the probability that its ends are in different subsets? Half. (Why? Because, there are 4 cases,(1) u is in subset, A and v is in subset, B; (2) u is in subset, A, and v is in subset, A; (3) u is in subset, B, and v is in subset, A; (4) u is in subset, B, and v is in subset, B. Each of these 4 events has probability 1/4. )

So now, what is the expected number of edges of G  crossing the cut? Answer: m/2 (by linearity of expectation.)

We know one simple fact: If the expected number of edges is m/2, then there exist cuts having at least m/2 edges crossing the cut. (Same as saying: if class average is 25, then  someone got at least 25).

Using the above fact, what have we just proved? There exists a cut having at least m/2 edges.

(2) Max-SAT: We have a set of m clauses, each in conjunctive normal form. What is the maximum number of clauses that can be satisfied?

We wont solve that. Instead, we’ll prove that there exists an assignment of truth values such that at least m/2 clauses are satisfied.

Proof: There are n variables in all, and m clauses. Assign truth values independently and equiprobably to each variable. Consider any one clause which has k variables. What is the probability that this clause is not satisfied? Answer: At most 2^(-k).

So, the probability that the clause is satisfied is at least 1 – 2^(-k), which is greater than or equal to 1/2.

Again, by linearity of expectation, the expected number of satisfied clauses is at least m/2.

Using the probabilistic method, we’ve proved the existence of two combinatorial objects. For more on this, read Ch. 5 of Randomized Algorithms by Motwani and Raghavan.

Shortest Paths

A path, as the name suggests connects 2 vertices in a graph. A shortest path is a path of minimum length.

If we have a particular vertex, say s, and need to find shortest paths from s to all other vertices of the graph. It is called the Single Source Shortest Path Problem (SSSP).

We may also want to find shortest paths between every possible pair of vertices in the graph. This is the All Pairs Shortest Paths Problem (APSP).

How do we find a shortest path between 2 vertices, say s and v?

Simple. Proceed in a manner that would remind us of Prim’s algorithm, which was discussed in the previous post.

We find the shortest path by sort of growing a cloud. Initially, we got only s. Now, choose that vertex which is closest to s. So now the cloud has 2 vertices, and the “diameter” of the cloud is the distance of this newly-added vertex, say x,  from s.

Next step. Now, find a vertex that is father from s than x, but is closer to s than any of the other vertices (or at least no farther than any of them). Call this vertex, y. Now, the path from s to y, will either be direct (i.e. one edge) or would be the addition of edge (x,y) to the path from s to x.

We continue in the same manner. Lets see one more step.

Suppose that the most recent vertex added to the cloud is d, and that the shortest path from s to d is s-> a -> b -> c -> d. How do we choose the next vertex to be added to the cloud?

Select the vertex that is farther from s than d (or at least no closer) and is closer to s than any other vertex outside the cloud (or at least no farther).  This vertex, say e, will have a shortest path from s, that is either a direct path (s->e), or an indirect one (s -> a -> e, or s-> a ->b ->e, or s-> a -> b ->c ->e, or s-> a -> b -> c-> d-> e).

Keep on adding vertices to the cloud until vertex v, also enters the cloud.

For the SSSP problem, keep adding vertices until the all vertices have been added.

This is known as Dijkstra’s algorithm.

It’s quite simple and also very well-known. Incidentally, it was discovered 50 years ago (1959). :)

Minimum Spanning Tree

A Minimum Spanning Tree is a tree that includes all the vertices of the graph. (Tree is a graph that is connected and has no cycles.)

So how do we find a MST of a graph? Standard textbook approach would be to use a greedy algorithm.

(Greedy algorithm: An algorithm where thinking greedily helps you to find the solution :) i.e. local optimal solutions lead to a global optimal solution.)

We’ll see three ways to find the MST.All are greedy.

1. Take the entire graph and start deleting edges one by one. Now, we have to be greedy. So, select the heaviest edge that is left in the graph, and delete it. Now, we also have to ensure that the resultant graph spans the graph. So, do not delete any edge that causes the graph to become disconnected; instead check for the next lightest edge to delete.

So, keep deleting edges starting from the heaviest to the lightest keeping in mind the above rules.

Till when do we continue this deletion??  Well, do it till you can delete no more edges without disconnecting the graph. Thats it. We have a MST.

2. Initially MST is empty. Add edges to this to get the MST. How to add edges? Again, greedily. Take the lightest edge. Add it to the MST. Now, we also need to ensure that the resultant MST remains a tree, i.e. there should be no cycles. So, while adding edges, do not add edges that cause the MST to have a cycle. Instead, go to the next heavier edge to add.

So, keep adding edges successively starting from the lightest to the heaviest, keeping in mind the above rules.

When do you stop? Simple; when addition of any more edge to the MST creates a cycle.

3. Initially the MST is empty. And again we’ll add edges greedily. Add the lightest edge to the MST. You get a connected component of 2 vertices, right? Now add the lightest edge that is incident upon this connected component. Ensure that the added edge does not create a cycle in the MST. In case the edge creates a cycle, skip adding it and proceed to the next heavier edge incident upon the connected component.

Keep adding till no more edges can be added without creating a cycle. There’s the MST.

The 2nd algorithm is called Kruskal’s algorithm, and the 3rd is called Prim’s algorithm.

Follow

Get every new post delivered to your Inbox.