Numerical computations in C++ : Converting from Decimal to Binary

Converting from decimal to binary notation is fairly standard and straightforward.

Let’s say we need to find the binary representation of 100_{10}.

The method that most of us would have been taught in school is the following:

- Divide 100 by 2. You get 50, with remainder = 0. Write 0.

- Divide 50 by 2, to get 25 and remainder = 0. Write 0.

- Divide 25 by 2 to get 12 and remainder = 1. Write 1.

- Continue doing so until the number (which is 25 at present) becomes 0.

- The binary representation of 100 is the reverse of what you have written.

——

We can write a recursive function to compute the binary representation of a decimal number. Using the power of recursion, we can avoid having to reverse the output as is necessitated by the above method.

——-

// Function dec_to_bin outputs the binary equivalent of the decimal number n.

void dec_to_bin ( int n)

{

if ( n < 0)

{

cout << “-”;

n = -n;

}

if (n == 0)

return;

dec_to_binary (n/2);

cout << n%2;

}

——–

dec_to_bin will take time proportional to the number of bits in the binary equivalent of n.

——–

Numerical computations in C++ : Converting from Decimal to Binary – 2

Earlier we looked at a recursive method for conversion from decimal to binary. We can also achieve the conversion in an iterative manner, however adding to the task the additional overhead of reversing the answer that we obtain.

The underlying method is the one discussed at the beginning of the previous post. At each iteration, we divide a number by 2 and store the corresponding remainder in an array. At the end of the process, we reverse the contents of the array.

Now, we are faced with a small problem. How do we know what the size of our array should be? Fortunately, we can proceed without knowing that, using the resizeable vector container class. (A vector is like a dynamic array. We need not specify its size at the time of definition.)

In order to reverse the vector, we can simply use the reverse algorithm defined in the algorithm header. Or if we simply want to output the binary representation (without needing to store it anywhere), we may simply use a reverse_iterator to iterate through our vector.

———–

// function dec_to_bin outputs the binary equivalent of the decimal input n.

#include <iostream>

#include <vector>

using namespace std;

void dec_to_bin ( int n )

{

vector <int> v; // vector of ints to store the binary representation.

int temp = n;

if ( n < 0)

temp = -n;

while ( temp != 0)

{

v.push_back (temp%2);

temp / = 2;

}

vector<int> reverse_iterator i;

if ( n < 0)

cout << ” – ” ;

for ( i = v.rend ( ); i != v.rbegin ( ); ++i)

cout << *i;

}

——-

If we used a list or a deque instead of a vector, we wouldn’t be faced with the task of having to reverse the values computed by us. We can do this by storing successive numbers that we generate at the front of the data structure instead of at the end.

We may do that using the push_front member function for lists and deques. (Note that we cannot use push_front for a vector. We can only insert and delete elements (efficiently i.e. in constant time) at the back of a vector.)

——

The above program using a deque along with push_front would look as follows:

——

#include <iostream>

#include <deque>

#include <algorithm>

using namespace std;


void dec_to_bin ( int n )

{

deque <int> v;

int temp = n;

if ( n < 0)

temp = -n;

while (temp != 0)

{

v.push_front ( temp%2);

temp /=2;

}

if ( n < 0)

cout << “-”;

copy (v.begin( ), v.end ( ), ostream_iterator<int> (cout,”");

}

——-

Instead of using an iterator to iterate through the deque, we can use the copy algorithm defined in the header algorithm to directly output the contents of the deque. (Note that the statement #include <algorithm> though written as part of the above program is not necessary, since algorithm.h automatically gets included when we write #include <deque>. )

——-

Numerical computations in C++ – Exponentiation by repeated squaring

Let’s say that we have to compute the power of a number without using the inbuilt pow function. We can compute the answer using the method of repeated squaring.

Firstly, we assume that the exponent is an integer.

For example. let us compute (13.5)^5.

Now, we know that 5 is 101 in binary notation.

Hence, what we need to compute is 13.5^4 \times 13.5^0 \times 13.5^1

We also observe that 13.5^4 = (13.5^2)^2 and that 13.5^2 = (13.5^1)^2.

From these observations, we can deduce that repeated squaring of 13.5 and multiplication of the appropriate intermediate results would lead us to the desired answer. (i.e. Compute 13.5^1, 13.5^2, 13.5^4 and take the product of 13.5^1 and 13.5^4.)

Here, prior to the repeated squaring, we calculated the binary representation of the given exponent. We need not necessarily do that. We can do the squaring and the computation of the binary representation simultaneously.

———

Here is a C++ code fragment to illustrate that:-

//function f takes a computes x^e (where e >= 0)  and returns a long value.

long f (int x, int e)

{

int temp = e, rem; //rem stands for remainder.

long answer = 1, term = 1;


while ( temp != 0)

{

term = term * x;

rem = temp%2;

if (rem == 1)

{

answer = answer * term;

}

temp / = 2;

}

return answer;

}

——–

If we assume that all arithmetic operations take O(1) time, then the above function takes O(n) time where n represents the number of bits in the binary representation of the exponent, e.

———

Dynamic programming – Probability, gambling and a die

Let’s say we have a fair 6-sided die. Player A can earn money based on what shows up when the die is rolled. The rule of the game is as follows: Player A will roll a die upto 3 times. After each roll, Player A has two choices: (1) He can take money in dollars equal to the number showing on the upturned face of the die. (i.e. if 5 shows on the die, he can take $5.)  In this case, the die is not rolled any more, and the game ends. (2) He can decide to discard this roll of the die, and decide to roll the die again. Note that the die can be rolled at most 3 times.

Question — What is the expected payoff for Player A?

——–

Let’s say that instead of 3 rolls of the die, the game allowed only for 1 roll. What is the expected payoff for Player A in this case? It is clearly equal to the expected value of the number that shows up when the die is rolled. And since the die is fair, this expectation is equal to (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. And Player A has no choice but to accept whatever shows up on the die because there is only one roll possible. Hence, in this game, the expected payoff for Player A is $3.5.

Now, consider what happens if instead of 1 roll, we allow for upto 2 rolls.

Now, if the 1st roll is a 6, obviously Player A would pocket $6 and decide to end the game there itself. Also, if the 1st roll is a 1, Player A can always decide to roll again [because irrespective of what shows up on the 2nd roll, it cannot be less than $1, and hence Player A has nothing to lose by deciding to discard the 1st roll and opting to roll again.]

Now consider what happens if the 1st roll is a 3. Now, Player A is faced with a predicament. What should he do? Should he decide to stop there and go home with $3, or should he discard this 3 in the hope of higher returns in the 2nd and final roll?

This is where having learnt probability well in school helps.

We know that the expected value of the number that shows up when the die is rolled once is 3.5.

Hence, if in the 1st roll, Player A gets a number greater than 3.5 i.e. either 4, 5 or 6, he can decide to stop there and take the corresponding amount of money.

However, if he gets either 1, 2 or 3 in his 1st roll, he should discard that, and decide to roll again, because the expected value of the 2nd roll is 3.5.

Hence, the expected payoff for Player A from a game which allows for at most 2 rolls is: \frac{3}{6} 3.5 + \frac {1}{6} 4 + \frac{1}{6} 5 + \frac{1}{6} 6 = 4.25.

(Here, 3/6 is the probability of getting a 1, 2 or 3 on the 1st roll, in which case the expected payoff becomes 3.5. )

——-

Now, coming back to our original game, we can decide as to how we should proceed after the die is rolled once.

We know that the expected payoff from 2 rolls of the die is $4.25.

Hence the strategy of Player A should be the following:

- If the 1st roll of the die yields either a 5 or a 6, take the money, and end the game.

- Else, discard the 1st roll, and continue the game.

The expected payoff can be computed easily based on the preceding observations:

Expected payoff for Player A = \frac{4}{6} 4.25 + \frac{1}{6} 5 + \frac{1}{6} 6 = \frac {14}{3}.

———-

This is a classic question using the principle of dynamic programming. Let’s say the the game allows for the die to be tossed upto n times.

Let E[k] denote the expected payoff for Player A in a game that allows upto n-k rolls of the dice.

The strategy for Player A after roll k should be the following:

– If the number showing on the dice is greater than E[k], take the money and stop the game.

– Else, discard that roll, and decide to continue on to the next roll.

——-

The recurrence relation for computing the E[k] values can be derived in terms of E[k+1] as was done in the case of the 3-roll example that we saw earlier. Once we have done that, we can calculate the values in the order, E[n-1], E[n-2] and so on uptil E[0].

——-

Discrete Math 101: Fun with Combinatorics – 2

3. {n \choose k} = {{n-1} \choose {k-1}) + {{n-1} \choose k}}

Our task is to select a team of k players from among n players. Now, we have 2 options – Either select player 1 to be part of the team, or leave him out of the team.

If we select player 1 to be part of the team, our task is reduced to selecting k-1 players from among n-1 players.

And if player 1 is not part of the team, our task is reduced to selecting k players from among n players.

Hence we have the above result.

———

4. {n \choose k} = {{n-1} \choose {k-1}} + {{n-2} \choose {k-1}} + { {n-3} \choose {k-1}} + \ldots.

Again our task is to select a team of k players from among n players – P_1, \ldots, P_n.

If P_1 is part of the team: We can select such a team in {{n-1} \choose {k-1}} ways.

If P_1 is not part of the team, and P_2 is part of the team: The problem is reduced to selecting k-1 players from among n-2 players.

If P_1 and P_2 are not in the team, and P_3 is in the team: The problem is reduced to that of selecting k-1 players from among n-3 players.

Proceeding in this manner, we get the above result.

———–

5. {n \choose k} = \frac {n}{k} {{n-1} \choose {k-1}}

Again, we have n players at our disposal from among whom we wish to select a team of k players.

Say we select P_1 to be the 1st player in our team. (Note that the term “first” actually means nothing since we wish to select a “set” of objects and not an “ordered list”. However the usage of such terminology is useful for the purposes of exposition.)

Now, all we need to do is to select k-1 players from among n-1 players.

Now, let’s go back a step. What if we select P_2 to be the 1st player of the team, and select the remaining players for the team in {{n-1} \choose {k-1}} ways.

We can see that we have n choices for the 1st player of a team.

Hence, we can select a team of k players in n \times {{n-1} \choose {k-1}}.

But again some teams are being repeated.

Given a fixed team, say T, we select that team k times (with each of the k players in the team being selected as the 1st player of the team). Hence we need to divide the previous result by k.

Hence we get the above result.

———-

Discrete Math 101 : Fun with Combinatorics -1

n \choose k is the number of ways we can select k items from a group of n items. Say the number of players in India’s world cup squad is 14. Only 11 players can play in any particular match. Hence, on match-day, the coach has 14 \choose 11 possibilities for selecting a team of 11 players from his squad. (Of course, such a situation is simply a theoretical conjecture, and no sane coach would actually list down each of 14 \choose 11 possible teams and pick one from among them.)

We can easily derive some interesting results using the above meaning of n \choose k.

1. {n \choose k} = { n \choose {n-k}}

Our task is to select a team of k players from among n players. We can do this by directly selecting which players are to be part of the team, or by selecting which players are to be left out, giving the above result.

2. n \choose k = \frac { n! } {k! (n-k)! }

Look at it this way. We have n players, say P_1, \ldots, P_n, and we want to select a team of k players from among them.

Let us select the players one by one.

For selecting the first player, we have n possible options.

Now having selected one player, we have n-1 possible options for selecting the 2nd player.

Similarly, given that we have selected i players, we have n-i options for selecting the i+1 th player.

Thus, we get that we can select k players from among n players in n \times (n-1) \times (n-2) \times \ldots \times (n-k) ways.

But wait!

There are many repetitions in the teams of 11 players that we selected. Consider any particular fixed team of k players. That team is selected by us in k! ways.

Therefore, we have: {n \choose k} = \frac {n \times \ldots \times (n-k)} {k!} = \frac {n!} {k! (n-k)!}.

——-

Note that given a list of k items, we can have k! permutations of those items. (This can be easily derived recursively.)

——-

Probability and Stochastic processes – Gambler’s ruin problem

Let’s say there are 2 tennis players, A and B. Now, A is a better player than B and hence wins 2/3 of the matches that they play. However, they don’t just play tennis – they also gamble on the results. Initially, let’s say that A has $1 and B has $2. The rule is that whoever loses a tennis match must give $1 to the other. They keep on playing tennis matches until one of them becomes bankrupt, at which point they stop. What is the probability that B will become bankrupt?

——

We note that whatever handing over of the money takes place between A and B, the total amount of money remains constant at $3.

There are 4 possible states —

State 0 — A has $0

State 1 — A has $1

State 2 — A has $2

State 3 — A has $3.

(Note that the amount of money that B has, in each of the above states, is completely determined by simply stating the amount of money with A since the total amount of money remains constant.)

We get the following transition diagram with probabilities depicted besides the arrows.

The probability that we wish to compute is the probability of reaching State 3 starting initially at State 1.

We can compute that quite easily. Let p_{i} denote the probability of reaching State 3 from State i.

We directly get the following from the above transition probabilities –

p_2 = \frac{2}{3} + \frac{1}{3} p_1

p_1 = \frac{1}{3} p_0 + \frac{2}{3} p_2.

p_0 = 0.

——-

Therefore, our desired probability, p_1 = 4/7, which means that there is 4/7 probability that the series of tennis matches would end with B becoming bankrupt.

——–

Probability – Monty Hall problem

Our friend Monty Hall is a game show host. Every week he asks one person from his audience to come onto the stage. On the stage are 3 closed doors. Behind one is a brand new car, and behind the other 2 are goats. The car is equally likely to be behind any of the 3 doors. The contestant selected from the audience obviously wants to win the car.

Monty Hall explains the rules of the game to the contestant — In front of you are 3 closed doors. Behind one of them is a car, and there are goats behind the other 2. I know what is behind each door. Your first task is to select a door.

Our contestant selects door 1.

—-

Monty now again starts expounding on probability — You have selected door 1. Now, I will select either door 2 or 3 to open. Note that I know what is behind each of the doors. I will decide which door to open based on the following rules:

- If door 2 has a car behind it, I will open door 3.

- If door 3 has a car behind it, I will open door 2.

- If neither of the above conditions hold true, i.e. both doors 2 and 3 have goats behind them, I will randomly select one among these 2 doors with probability 0.5 and open that door.

—-

Now, having explained thus to our contestant, the omniscient Monty goes and opens door 3, and reveals to the world (including our contestant) the goat standing behind it.

—-

Now, Monty turns to the contestant for the endgame and speaks thus — As you can can see, door 3 has a goat behind it. Doors 1 and 2 are still closed. Whichever door you select, you will get whatever is behind that door. You can either decide to stay with your original choice i.e. door 1, or you could decide to switch i.e. select door 2. What do you want to do — stay or switch?

As the music in the recording studio rises to a crescendo, our dear contestant is faced with this question on probability to answer. What should he do?

————

At first glance, it does seem that it is immaterial as to what the contestant chooses. One could argue that there are 2 doors (doors 1 and 2) one of them having a goat behind it, and the other having a car behind it. The car is equally likely to be behind either of the 2 doors. So, the contestant could as well flip a fair coin, and choose which door he wants to go with.

—-

However there is a flaw in the above reasoning.

– The probability that there is a goat behind door 1 is 2/3, and the probability that there is a car behind door 1 is 1/3.

To see why this is so, consider the following argument — Initially there were 3 doors. Two of them had goats behind them and one of them had a car behind it. Door 1 was one among these 3 doors. Therefore, the probability that door 1 has a car behind it is 1/3 and the probability that it has a goat behind it is 2/3.

Now, you could argue against the above explanation this way : ” What you said about the probability of a car being behind door 1 being 1/3, and that probability of a goat being behind door 1 being 2/3 was true initially, and not after Monty went and opened door 3, and showed everyone the goat standing behind it. Once Monty did that, we were left with 2 doors, and 1 goat and 1 car, and hence the probability that there is a car behind door 1 is 1/2 and the probability that there is a goat behind door 1 is also 1/2.”

But again, the argument that you presented above is not wholly correct. We can see it this way — we agree that initially i.e. before Monty opened door 3 and showed a goat behind that, the probability of these being a car behind door 1 is 1/3. My claim is that this remains the same even after what Monty did.

To better understand that, consider the following situation —

Assume that there are 100,000 doors instead of just 3. Also, behind 99,999 of these doors are goats and behind a single one is a car. The car is equally likely to be behind any of the 100,000 doors. So, what we have here is a 100,000-door version of Monty’s 3-door problem.

Now, you select a door, say door 385.

Now, Monty has been asked to open 99,998 doors, i.e. he has been asked to keep 2 doors closed. It is also specified in the rules that one of these closed doors should be door 385 since that was the one you selected. Further, Monty knows what is behind each of the doors, and Monty has been specifically instructed that the 99,998 doors that he opens should all reveal goats behind them.

So Monty goes and opens 99,998 doors. He leaves door 385 and one other door, say door 6724 closed. Now, as we said earlier, behind each of the 99,998 doors that Monty opened, was a goat.

Now, what do you think is the probability that there is a goat behind your selection door 385?

It is 1/100,000 and not 1/2.

The reason as to why door 385 remained closed even when just 2 doors remained (i.e. 385 and 6724) was because Monty had been instructed not to open 385 as part of the rules.

There is 1/100,000 probability that door 385 contained a car behind it, and Monty selected door 6724 uniformly at random from the remaining 99,999 doors for keeping closed. In such a situation, staying with your original selection (i.e. door 385) would win you the car.

However, it is much much more likely (99,999/100,000) that your initial selection (385) had a goat behind it, and that Monty knowing that 6724 had a car behind it, opened the remaining 99,998 doors. In this situation, switching (i.e. selecting door 6724) would get you the car.

—-

Coming back to our 3 door version, we can argue that switching would win you the car with probability 2/3, while staying with your original selection would get you the car with a probability of 1/3.

Hence, one who has a working knowledge of probability should advise the contestant to switch.

Probability and Statistics – 1 : coin flips

Question 1. I toss a fair coin until I obtain Heads. How many times am I expected to toss the coin (this number includes the coin toss which resulted in the Head)? What is the variance of the number of coin tosses?

Solution. This is exactly the geometric distribution with p = 0.5. Hence, I am expected to toss the coin 1/p = 2 times.

We can derive it as follows:

Let X be the random variable denoting the number of coin tosses.

If we get a H in the 1st toss, we are done. —- Pr [ X = 1 ] = 1/2. ( the probability is 1/2 since it is a fair coin.)

For us to have X = 2, the 1st toss must result in a T, and the 2nd in a H. —-

Pr [X=2]=0.5^2.

Similarly, X = 3 only if we get the sequence T T H in our 1st 3 coin tosses. —- Pr [ X = 3] = 0.5^3.

Thus, we have that,

E [ X ] = 1 \times 0.5 + 2 \times 0.5^2 + 3 \times 0.5^3 \ldots

\Rightarrow 0.5 E[X] = 0.5^2 + 2 \times 0.5^3 + \ldots

Subtracting the 2nd equation from the 1st, we get:

0.5 E [X] = 0.5 + 0.5^2 + 0.5^3 + \ldots

\Rightarrow E[X] = 2.

—-

Variance of X:

Var (X) = E[X^2] - (E[X])^2

We have that, E[X^2] = 1 \times 0.5 + 4 \times 0.5^2 + 9 \times 0.5^3 + \ldots

\Rightarrow 0.5 E[X^2] = 0.5^2 + 4 \times 0.5^3 + 9 \times 0.5^4 + \ldots

Subtracting the 2nd equation from the 1st:

0.5 E[X^2] = 0.5 + 3 \times 0.5^2 + 5 \times 0.5^3 + \ldots

\Rightarrow 0.25 E [X^2] = 0.5^2 + 3 \times 0.5^3 + \ldots

\Rightarrow 0.25 E[X^2] = 0.5 + 2 (o.5^2 + 0.5^3 +\ldots)

\Rightarrow E[X^2] = 6.

Therefore, Var (X) = 6 - 4 = 2.

—–

Question 2.I toss a fair coin until I get either HH or TT. Let X be the number of coin tosses I make. Find E[X].

Solution.

The 1st coin toss can result in either a H or a T. If we are given that X = k, then the remaining k-1 coin tosses are uniquely determined given the result of the 1st toss. The fixed sequence occurs with probability 0.5^{k-1}.

Thus, we have that, E [X] = 2 \times 0.5 + 3 \times 0.5^2 + 4 \times 0.5^3 + \ldots

Solving, we get, E[X] = 3.

—-

Probabilistic Method – 6

Q.1. Let X_1, ..., X_n be non-negative real numbers. If \lim_{n \rightarrow \infty} \frac{Var[X_n]} { (E[X_n])^2} = 0, then \lim_{n \to \infty} Pr [ X_n > 0] = 1.

Solution.

Pr [X_n \leq 0] \leq Pr[ | X_n - E[X_n] | \geq E[X_n] ] \leq \frac{Var[X_n]}{ (E[X_n])^2}. (the 2nd inequality follows from Chebyshev’s formula)

Taking limits as n \rightarrow \infty on both sides, we get the desired result.

-

Q.2. If p >> 1/n, prove that, as n approaches infinity, G(n,p) contains a triangle. If p << 1/n, prove that, as n approaches infinity, G(n,p) does not contain a triangle.

Solution.

Let Y_1, \ldots, Y_{ {n \choose 3} } be indicator variables for the corresponding 3-vertex-subsets inducing a triangle in G(n,p).

Let Y = \sum Y_i.

\therefore E[ Y ] = { n \choose 3} p^3. \ldots (1)

Var[Y] = \sum_i {Var [Y_i] } + \sum_{ i \neq j} {Cov[Y_i, Y_j]}

= {n \choose 3} (p^3 - p^6) + 12 {n \choose 4} (p^5 - p^6)

\leq {n \choose 3} p^3 + 12 {n \choose 4} p^5. \ldots (2)

(Here, Cov [ Y_i, Y_j] = 0 if the corresponding triangles do not share an edge; number of ordered pairs of triangles sharing an edge = 12 { n \choose 4}.)

If p << 1/n, E[Y] = 0 as n approaches infinity, proving one part of the question.

Now, from (1) and (2), \frac{Var [Y]} {(E[Y])^2} \leq \Theta ( \frac{1} {n^3 p^3} + \frac {1} { n^2 p}).

\therefore \mbox { If } p >> 1/n \mbox{, } \lim_{n \to \infty} \frac{Var[Y]} { (E[Y])^2 } = 0, which implies that lim_{n \to \infty} Pr [ Y > 0] = 1, proving the other part of the question.

-

Probabilistic Method – 5

Q.1. Prove that {2m \choose m} \geq \frac{2^{2m}}{4 \sqrt{m} + 2}.

Solution.

Toss an unbiased coin 2m times with the coin tosses being independent. Let X denote the number of heads in the coin tosses.

\therefore E[X] = m; \mbox{ } Var [X] = m/2.

Pr[ | X - E[X] | < \sqrt{m} ] \geq 1 - 1/2 = 1/2. (using Chebyshev’s inequality) \ldots (1)

Also, Pr[ \mbox{ exactly k heads occur} ] = { 2m \choose k} 2^{-2m} \leq {2m \choose m} 2^{-2m}. (because {2m \choose m} \geq { 2m \choose k} for all possible k )

\therefore Pr[ | X - E[X] | < \sqrt{m} ] \leq (2 \sqrt{m} + 1) { 2m \choose m} 2^{-2m}. \ldots (2)

Combine (1) and (2) to get the desired result.

-

Probabilistic Method – 4

Q.1. Prove that there exists a tournament on n vertices having at least \frac{n!}{2^{n-1}} Hamiltonian paths.

Solution.

A tournament is an orientation of a complete graph. Let each of the n \choose 2 edges be oriented, independently, in either direction with probability 1/2.

Take a random permutation, \sigma of the vertex set of the graph.

Pr[ \sigma \mbox{ represents a Hamiltonian path} ] = \frac {1}{ 2^{n-1}}.

Let X be a random variable denoting the number of Hamiltonian paths.

\therefore E [X] = \frac{n!}{ 2^{n-1}}.

-

Q.2. Independence number \alpha(G) is the size of a largest independent set of G = (V,E) . Prove that for any  graph G having n vertices, m edges and average degree d = \frac{2m}{n} \leq 1, \alpha(G) \geq \frac{n} {2d}.

Solution.

Let S \subseteq V be formed by sampling each vertex of G, independently, with probability p. Let X denote the number of vertices in S, and Y, the number of edges in the subgraph G[S] induced by S.

\therefore E[X] = np \mbox{ and } E[Y] = m p^2 = \frac{1}{2} n d p^2.

\therefore E[X- Y] = np(1 - dp/2).

\therefore There exists a subset S \subseteq V such that the difference between the number of vertices in S and the number of edges in G[S] is at least np(1-dp/2).

Remove a vertex for each edge in G[S] until the resultant set is an independent set. (This procedure removes at least as many edges as vertices.)

Set p =1/d.

\therefore There exists an independent set of size at least \frac{nd}{2}.

Probabilistic Method – 3

Q.1. Given a set S and S_1, \ldots, S_m \subseteq S, such that \forall i \mbox{, } |S_i| = l. A coloring of the elements of S is a valid 2-coloring if \forall i \mbox{, } S_i is not monochromatic. Prove that if m < 2^{l-1}, there always exists a valid 2-coloring.

Solution.

Randomly, and independently, assign either Red or Blue to each element of S with probabilities p \mbox{ and } 1-p respectively.

\forall i, Pr[ S_i \mbox { is monochromatic}] = 2[ p^l + (1-p)^l ].

By union bound, Pr[ \mbox{ some } S_i \mbox { is monochromatic}] = 2m [ p^l + (1-p)^l ].

If 2m [ p^l + (1-p)^l ] < 1, there exists a valid 2-coloring of S.

Put p = 1/2 to get the desired result.

-

Q.2. Ramsey number, \mathcal{R}(k,l) = min\{ n: \mbox{ any graph on } n \mbox { vertices has a clique of size at least } k \mbox {or an independent set of size at least } l \}. Prove that \mathcal{R}(k,k) > 2^{(k-1)/2}.

Solution.

Consider random graph G(n,p) on n vertices where each of the n \choose 2 possible edges is sampled with probability p.

Take a fixed subset S of k vertices.

Pr[ \mbox{ S is an independent or a clique}] = p^{ k \choose 2 } + (1-p)^{ k \choose 2}.

\therefore Pr[ G(n,p) \mbox { has a clique or independent set of size at least } k] \leq { n \choose k} [p^{ { k \choose 2 } } + (1-p)^{ { k \choose 2}}].

We want { n \choose k}[p^{ { k \choose 2 } } + (1-p)^{ { k \choose 2}} ]< 1.

Now, since { n \choose k } \leq n^k (and setting p = 1/2), we will find n such that:

n^k \leq 2^{ {k \choose 2} -1}, which is true whenever n \leq 2^{(k-1)/2}.

Hence, there exist graphs on at most 2^{(k-1)/2} vertices which do not have either a clique or an independent set of size at least k.

-

Probabilistic Method – 2

Q.1. Let n \geq 2k be positive integers. Let \mathcal{C} be a collection of pairwise intersecting k-element subsets of [n] = \{1, 2, \ldots, n\}  (i.e. if A, B \in \mathcal{C}, then A \cap B \neq \phi). Prove that |C| \leq { {n-1} \choose {k-1}}.

Solution.

Let A be a k-element subset of [n] selected uniformly at random from the {n} \choose {k} possible such subsets.

Now, Pr[A \in \mathcal{C}] \leq k/n \Rightarrow |C| \leq { {n} \choose {k} } . k/n = { {n-1} \choose {k-1}}.

We will try to prove that Pr[ A \in \mathcal{C}] \leq k/n.

Let A be selected as follows. Take a random permutation \sigma \in \mathcal{S}_n. Take a random index 1 \leq i \leq n, and let A = \{i, \ldots, i + k-1\} (wrapped around n).

Now, if given a fixed \sigma, Pr[ A \in \mathcal{C}| \sigma] \leq k/n, then Pr[ A \in \mathcal{C}] \leq k/n.

Further, since there are at most k possible indices for i such that \{\sigma(i), \ldots, \sigma(i+k-1)\} (wrapping around n) can be a part of \mathcal{C} (due to the property of pairwise intersection), Pr[ A \in \mathcal{C} | \sigma ] \leq k/n.

--

 

Probabilistic Method – 1

Q.1. Let A_1, \ldots, A_n and B_1, \ldots, B_n be distinct subsets of \mathbb{N}, such that:

  • \forall i, A_i \cap B_i = \phi.
  • \forall distinct i,j (A_i \cap B_j) \cup (A_j \cap B_i) \neq \phi.

Prove that, for every real 0 \leq p \leq 1, \sum_i {p^{|A_i|} (1-p)^{|B_i|}} \leq 1.

Solution. Let us create a random subset R by sampling each element of \mathbb{N} with probability p.

Let E_i denote the event that R contains all elements of A_i and none of B_i. Pr [E_i] = p^{|A_i|} (1-p)^{|B_i|}.

Now, the events E_i are pairwise disjoint.

Therefore, \sum_i {p^{|A_i|} (1-p)^{|B_i|}} = Pr[ some E_i occurs] \leq 1.

--

Q.2. Let A_1, \ldots, A_n and B_1, \ldots, B_n be distinct subsets of \mathbb{N} such that:-

  • \forall i, A_i \cap B_i = \phi.
  • \forall distinct i, j, A_i \cap B_j \neq \phi.
  • \forall i, |A_i| =r, |B_i| = s.

Prove that n \leq {r+s} \choose {r}.

Solution.

Consider a random permutation of the universe X consisting of all elements in the sets A_i, B_i.

Let E_i denote the event that all elements of A_i occur before every element of B_i in the permutation.

Pr [ E_i] = { {r+s} \choose {r}}^{-1}.

Now, the events E_i are pairwise disjoint.

Therefore, n. {{r+s} \choose {r}}^{-1} \leq 1.

--

Greed, and love of dear life: The Pirate Problem

There are n pirates discussing on how to divide their booty of 100 gold coins. The pirates have a strictly defined seniority order among themselves. Let’s say that, for all i, Pirate i is more senior than Pirate j, for all j < i.The method that they agree upon to divide their loot is the following:-

—-

Initially there are n pirates.

i = n;

While (there is a pirate alive)

Pirate i proposes a division.

If (at least 50% of the pirates alive approve of his division plan)

Division takes place, and things end here.

Else

Pirate is killed (and of course, the division proposed by him is not carried out).

i – -;

End of While loop.

End of algorithm.

We also have a few other observations to make:

  1. The pirates value their own individual lives most of all.
  2. Next to protecting their lives, the thing uppermost on their minds is to maximize their own share of the loot.
  3. The 3rd most important consideration for the pirates is to reduce the number of pirates on board.
  4. All pirates are extremely intelligent (and thus, each certainly knows that, each of the others is extremely intelligent).
  5. Whenever pirates have to make a decision (whether to approve/ disapprove of, or to propose a division plan), they keep in their minds the 4 points mentioned above.

Question: Say, n = 5, what division should the senior-most pirate propose (he is the one who gets the first chance to propose a division plan) so as to ensure that he doesn’t get killed, and given that he doesn’t get killed, to maximize his share of the loot?

Solution.

Consider what would happen with just one pirate. Obviously, he would get all the 100 coins.

With 2 pirates, Pirate 2 gets to propose a division. He knows that he will always get 50% of the vote (by approving of his own plan), and hence can propose freely any division that he likes. Hence, he will propose a division with 100 coins for himself, and none for Pirate 1, and get away with it.

With 3 pirates, Pirate 3 has to decide on how to divide the loot. He knows that if he dies, only 2 pirates would be left. In such a scenario, we all know that Pirate 1 will not get any coin. Hence, Pirate 1, being greedy, has an incentive to accept any plan that Pirate 3 proposes giving him > 0 gold coins so as to ensure that Pirate 3′s division plan is carried out. Hence, Pirate 3 can win 2 votes (out of 3) by proposing a division giving 1 coin to Pirate 1 and keeping 99 for himself.

With 4 pirates, Pirate 4 gets to decide the division scheme. Again, we know that, if Pirate 4 dies, Pirate 2 wouldn’t get even one gold coin, in the subsequent division which would be proposed by Pirate 3. Hence, Pirate 4 can win 2 votes (out of 4) by proposing a division giving 1 coin to Pirate 2, and keeping 99 for himself.

With 5 pirates, it is Pirate 5 who gets to propose. Again, his death would result in Pirate 1 and Pirate 3 not getting anything in the subsequent division which would be proposed by Pirate 4. Hence, to win their votes, and get 3 votes (out of 5), Pirate 5 can give 1 coin each to Pirates 1 and 3, and keep 98 for himself.

—-

Generalizing for n (n < 100):-

Pirate n can propose a division giving 1 coin each to Pirates n-1, n-3, … (up to either 1 or 2), and keep the rest for himself.

—-

Probability review

Basic probability review

  • Sample space {\Omega} can be viewed as the set of results obtained from the probabilistic experiment under consideration. e.g. If the experiment consists of flipping a coin twice, then the sample space would consist of {\{HH, HT, TT, TH \}}. Elements of {\Omega} are termed elementary events. Subsets of {\Omega} are termed events. e.g. {\{HH, HT\}} is the event that the first coin flip results in a Head.
  • Definition A {\sigma-} field {(\Omega, \mathcal{F})} consists of a sample space {\Omega} and a set {\mathcal{F}} which is a collection of subsets of {\Omega}, and satisfies the following properties:
    • {\phi \in \mathcal{F}}
    • closure under complement {\mathcal{\epsilon} \in \mathcal{F} \Rightarrow \bar{\mathcal{\epsilon}} \in \mathcal{F}}
    • closure under union { \forall i \in I, } { \mathcal{\epsilon}_i \in \mathcal{F} \Rightarrow \cup_{i \in I} \mathcal{\epsilon}_i \in \mathcal{F}}.

    Note that closure under union and closure and complement imply closure under intersection (since {A \cap B = \overline{ \bar{A} \cup \bar{B} } }). Note also that {\Omega \in \mathcal{F}}. Elements of {\mathcal{F}} can be termed as observable events, and {\mathcal{F}} can be termed as an information structure.

  • Definition Given a {\sigma-} field {(\sigma, \mathcal{F}}, a probability measure {Pr : \mathcal {F} \rightarrow \mathbb{R}^{+}} is a function which satisfies:
    • For all {\mathcal{\epsilon} \in \mathcal{F}}, {0 \leq Pr [ \mathcal{\epsilon}] \leq 1}.
    • {Pr [ \Omega] = 1}.
    • For mutually disjoint events, {\mathcal{\epsilon}_1, \mathcal{\epsilon}_2, \ldots }, {Pr [ \cup_i \mathcal{\epsilon}_i ] = \sum_{i} Pr [ \mathcal{\epsilon}_i ]}.

    Some other properties that follow from the above:

    • {Pr [ \overline{\mathop{\mathbb E}\mathcal{\epsilon}}] = 1 - Pr [ \mathcal{\epsilon}] }
    • {Pr [ A \cup B ] = Pr [ A ] + Pr [ B] - Pr [ A \cap B]}.
  • Definition A probability space {(\sigma, \mathcal{F}, Pr)} consists of a {\sigma-} field {(\sigma, \mathcal{F})} together with a probability measure {Pr} defined on it.
  • Principle of inclusion-exclusion\displaystyle  Pr [ \cup_{i=1}^{n} \mathcal{\epsilon}_i ] = \sum_i Pr [\mathcal{\epsilon}_i] - \sum_{i < j} Pr [ \mathcal{\epsilon}_i \cap \mathcal{\epsilon}_j] + \sum_{i < j < k} Pr [ \mathcal{\epsilon}_i \cap \mathcal{\epsilon}_j \cap \mathcal{\epsilon}_k] -/+ \ldots  
  • Definition Condition probability of event {A} given event {B} is defined as :\displaystyle  Pr [ A | B ] = \frac { Pr [ A \cap B ] } { Pr [B] } .

    From this, we have that : { Pr [ X \cap Y ] = Pr [ Y ] Pr [ X | Y ] = Pr [X ] Pr [Y | X ]}.

     

  • If {\mathcal{\epsilon}_1, \ldots, \mathcal{\epsilon}_k} denote a partition of the sample space {\Omega}, then, for any event {\mathcal{\epsilon}}, :\displaystyle  Pr [ \mathcal{\epsilon} ] = \sum_{i=1}^{k} Pr [\mathcal{\epsilon}_i ] Pr [ \mathcal{\epsilon} | \mathcal{\epsilon}_i] 
  • Bayes’ rule If {\mathcal{\epsilon}_1, \ldots, \mathcal{\epsilon}_k} denote a partition of the sample space {\Omega}, then, for any event {\mathcal{\epsilon}}, :\displaystyle  Pr [ \mathcal{\epsilon}_i | \mathcal{\epsilon}] = \frac { Pr [ \mathcal{\epsilon}_i] Pr [ \mathcal{\epsilon} | \mathcal{\epsilon}_i ] } { \sum_{j=1}^{k} Pr [\mathcal{\epsilon}_j] Pr [\mathcal{\epsilon} | \mathcal{\epsilon}_j]} 
  • Definition A collection of events {\{ \mathcal{\epsilon}_i | i \ in I\} } is \emph {independent} if for all subsets { S \subseteq I }, :\displaystyle  Pr [\displaystyle \cap_{i \in S} \mathcal{\epsilon}_i ] = \displaystyle \prod_{ i \in S } Pr [ \mathcal{\epsilon}_i]  
  • Equivalent defintion A collection of events {\{ \mathcal{\epsilon}_i | i \ in I\} } is \emph {independent} if for all {j \in I} and for all subsets { S \subseteq I \setminus \{j\} },\displaystyle  Pr [ \mathcal{\epsilon}_j | \displaystyle \cap_{i \in S} \mathcal{\epsilon}_i ] = Pr [ \mathcal{\epsilon}_j]
  • A collection of events {\{ \mathcal{\epsilon}_i | i \ in I\} } is \emph {k-wise independent} if every set of {k} events in the collection is independent. 2-wise independence is called pairwise independence.
  • Definition A random variable {X} is a real-valued function over the sample space {X : \Omega \rightarrow \mathbb{R}} such that for all {x \in \mathbb{R}}:\displaystyle  \{ \omega \in \Omega | X(\omega) \leq x \} \in \mathcal{F}
  • The distribution function { F : \mathbb{R} \rightarrow [0,1]} for a random variable {X} is defined as { F_X (x) = Pr [ X \leq x]}.
  • The density function {p : \mathbb{R} \rightarrow [0,1]} is defined as {p_X (x) = Pr [X = x ]}.
  • The joint distribution function and joint density function are defined likewise :\displaystyle  F_{X,Y} (x,y) = Pr [ \{X \leq x\} \cap \{ Y \leq y \} ] \displaystyle  p_{X,Y} (x,y) = Pr [ \{X =x\} \cap \{ Y = y \} ]

    .

Infinity of primes

Q.1. Prove that there are infinitely many primes.

Solution.

Assume to the contrary a finite set of primes \{p_1, p_2, ..., p_k\}. \ldots (1).

Consider n = p_1 \times p_2 \times ... \times p_k + 1.

Now, p_i | n-1 \forall 1 \leq i \leq k .

\therefore n is not a multiple of any prime from the assumed finite set.

Hence, n satisfies the definition of a prime number (i.e. one which is not a multiple of primes strictly smaller than itself.)\ldots (2).

(1) and (2), implies contradiction.

 

Q.2. Prove that there are infinitely many primes of the form 4n -1.

Solution.

Claim. A number of the form 4n-1 always has a prime factor of the form 4n-1.

Proof of claim.

Consider any k = 4n-1.

If k is prime, we are done.

Else, k = k_1. k_2, where k_1 \equiv 1 (mod 4) and k_2 \equiv 3 (mod 4). (verify!)

If k_2 is prime, we are done. Else, proceed as above.

The process concludes (if it does not find a greater prime of the form 4n-1) at 3.

Hence, the claim is proved.

-

Assume there are only a finite number of primes of the form 4n-1. Their product is either of the form 4n-1 or 4n+1. Add either 4 or 2, as the case may be, to obtain the next number, call it p, of the form 4n-1.

Now, p is not a multiple of any of the primes from the assumed finite set, contradicting the statement of the claim.

-

 

 

More on the Dial-a-Ride problem

We present here a so-called Structure Theorem given by Gupta et al. [1] for the non-preemptive Dial-a-Ride problem on an n-vertex metric space (V,d) having m demand pairs, and a specified start vertex, r.

We say that a vehicle is in a pick-up phase during some portion (segment) of a tour, if it only performs pick-ups during that time interval. Similarly, we say that a vehicle is in a delivery phase during a particular segment if it only performs deliveries in that segment. If the optimal tour had a structure which allowed it to be split into pick-up and delivery phases, i.e. if the optimal tour could be split as follows: the vehicle first picks up a set of at most k items; it then delivers each of those items to their destinations; it again picks up a set of at most k items, and so on, – we would have been able to apply the greedy paradigm for the problem. We could select a best-possible set of (at most k) items to service during a segment.

However, the optimal tour may not exhibit such a structure. Gupta et al. show that a near-optimal tour displays this property.

Theorem 1. Given an instance of non-preemptive Dial-a-Ride, there exists a feasible tour, \tau within O(\log {m}) of optimal, such that \tau can be split into a set of segments – \tau = S_1. S_2 ... S_t , wherein each segment, S_i services a set, O_i of at most k demand pairs (items) by first picking up all items in O_i and then delivering them to their respective destinations.

Proof. Consider an optimal tour \sigma. Represent \sigma as a line L such that every edge in L corresponds to an edge traversal in \sigma. The i^{th} edge of L corresponds to the i^{th} edge traversed in \sigma. (That is, L is simply the tour \sigma, wherein vertices visited multiple times in \sigma are represented that many times in L.)

By the triangle inequality, we have that the number of edges in \sigma, denoted by | \sigma | (equal to the number of edges in L) is at most 2m + 1.

Let d(\sigma) denote the length of \sigma.

Define stretch of item i as the number of edges traversed in \sigma after it is picked up, and before it is delivered (i.e. the number of edges it travels on the vehicle).

Divide the m items into groups G_j, j = 1, ..., \lceil \log {(2m)} \rceil, such that G_j is the set of all items having stretch between 2^{j-1} and 2^j.

We shall service items in each group separately.

Consider group G_j. Denote by \tau_j the portion of the tour where items in G_j are serviced.

We first prove a lemma relating to \tau_j.

Lemma 2. There exists a tour \tau_j of length at most 6 d(\sigma), such that \tau_j can be divided into a set of segments, each servicing at most k items in a manner similar to that stated in Theorem 1.

Proof. Let r = 2^{j-1}. Divide L into segments having r edges each. There are \lceil \frac { |\sigma|} {r} such segments- the first r edges in L form the first segment (denoted by T_{1,j}), the next set of r edges forms segment T_{2,j}, and so on.

Denote the set of items originating in segment T_{l,j} by O_{l,j}. (i.e. Items in O_{l,j} have sources between vertices (l-1)r and lr -1. Note that items in O_{l,j} are exactly those items that cross edge (lr-1, lr) in \sigma. This implies that |O_{l,j}| \leq k. Note also that items in O_{l,j} have destinations between the vertices lr and (l+2)r - 1.

Consider set O_{l,j}.

We service it as follows: Start the vehicle (initially empty) at vertex (l-1)r. Pick up items in O_{l,j} originating in T_{l,j}. Deliver these items (\leq k) to their destinations in T_{l+1,j} and T_{l+2,j}. Return with the empty vehicle to vertex lr. Denote by \omega_{l,j} this portion of the tour where items in O_{l,j} are serviced.

By concatenating the portions \omega_{l,j} for possible values of l, we obtain a tour, \tau_j that services all items in G_j.

We make the following observations:

  • Each edge in L is a part of at most 3 portions/segments: \omega_{l,j}.
  • Each edge is traversed at most twice in any segment/portion.
  • The number of items carried by the vehicle at any time in any \omega_{l,j} is at most k.

The above observations make it clear that \tau_j has length at most 6 d(\sigma) proving Lemma 2.

We service items in each of the \lceil \log {(2m)} \rceil groups, G_j similarly. We thus get a tour that is O(\log {m}) d(\sigma), completing the proof of Theorem 1.

References:

[1] A. Gupta et al – Dial-a-Ride from k-forest, 2010.

On the Dial-a-Ride problem – 3

We present here a polynomial-time 2-approximation algorithm given by Charikar and Raghavachari [CR 2001] for the Capacitated Preemptive version of Dial-a-Ride on trees. For ease of visualization, we consider rooted trees, though the algorithm works as such for any tree.

The items are assumed to be located on the leaves of the tree, T. Consider a demand pair \{s_i, t_i\}. We will call the least common ancestor of s_i and t_i (the least-depth vertex in the unique path between s_i and t_i) as the turning point for that demand pair. Given the m demand pairs, all turning points can be easily computed in polynomial time. The algorithm that follows makes use of these pre-computed turning points.

The algorithm modifies the usual depth-first search, and runs two modified traversals in sequence. In the first traversal, called the upsweep, all items are shifted upwards from their sources to their respective turning points. In the second traversal, called the downsweep, the items are shifted downwards to their respective destinations.

Consider any edge e.

  • In the upsweep phase, the algorithm recursively explores the sub-tree rooted at e at the end of which all items that must necessarily pass upwards through e are shifted to the lower end point of e. The total number of such items is flow_u(e). The items are then shifted upwards across e. Clearly, edge e is traversed 2 max \{ 1, \lceil \frac{flow_u(e)}{k} \rceil \} times.
  • In the downsweep phase, items that need to pass downwards across e are shifted to the upper end point of e before the subtree rooted at e is recursively explored. Again, it is clear that e is traversed 2 max \{1, \lceil \frac{flow_d(e)} {k} \rceil \} times in the downsweep.

Notes:

  • Preemption is used in carrying items across edge e in the above description. The vehicle travels back and forth across e carrying items to the other end point.
  • Clearly, the algorithm runs in polynomial time.
  • This is a 2-approximation algorithm. This is obtained directly from the Steiner tree, and the flow lower bounds.

References:

  • [CR 2001] M. Charikar, B. Raghavachari: The Finite Capacity Dial-a-Ride Problem, 2001.

On the Dial-a-Ride problem – 2

We consider here the capacitated Dial-a-Ride problem on the tree metric. The m demand pairs \{(s_i, t_i)\}_{i=1}^{m} are assumed to be on the leaves of the tree, T.

  • For each demand pair (s_i, t_i), we have a unique path p_i from s_i to t_i in T.
  • Call all vertices which are on any of the paths \{p_i\}_{i=1}^{m} as interesting.
  • Call all edges which are on any of the paths \{p_i\}_{i=1}^{m} as essential.
  • Clearly, any tour that satisfies all demand pairs must cover all essential edges, and hence, all interesting vertices.

We present two lower bounds on the number of edge traversals in the optimal tour.

1. Steiner tree lower bound:- Let Q be the least cost Steiner tree spanning all interesting vertices. Q would include all essential edges, and if \cup_{i=1}^{m} p_i is not connected, Q will include other edges too. Clearly, any tour that satisfies all demand pairs must traverse every edge of Q at least twice. We call this the Steiner tree lower bound.

2. Flow lower bound:- With every edge e in T, we associate two quantities flow_u(e) and flow_d(e). For every edge e = \{u,v\}, fix, for the purpose of computation of flow_u(e) and flow_d(e), an ordering of the end points. flow_u(e) equals the number of paths, p_i which require an item to pass from vertex u to vertex v. flow_d(e) corresponds to the number of paths, p_i which require an item to pass from vertex v to vertex u. (Here, ‘u’ denotes ‘upwards’, and ‘d’ denotes ‘downwards’.)

Clearly, for a vehicle of capacity k, edge e must be traversed at least 2 max \{ \lceil \frac{flow_u(e)}{k}\rceil, \lceil \frac{flow_d(e)}{k}\rceil \} in any feasible Dial-a-Ride tour. We call this the flow lower bound.

Notes:

  • The flow lower bound provides a positive lower bound only for essential edges. The Steiner tree lower bound provides a positive constant lower bound of 2 for all essential edges, and, in case \cup_{i=1}^{m} p_i is disconnected, for certain non-essential edges.
  • The flow lower bound is always at least as strong as the Steiner tree lower bound for essential edges.
  • The two bounds can be combined in the obvious manner to give a stronger lower bound.
  • For infinite or arbitrarily high capacity k, the flow lower bound translates to a lower bound of 2 for all essential edges and zero for non-essential edges. The Steiner tree bound remains 2 for all edges in Q.
  • For k=1, the flow bound translates expectedly to 2 max\{flow_u(e), flow_d(e)\}. Intuitively, it is clear that as the capacity of the vehicle decreases, the Steiner tree lower bound, which does not take this into account, becomes progressively weaker as compared to the flow bound for essential edges.
  • Intuitively, the flow bound is concerned with the capacity of the vehicle for the flow of items across essential edges. The Steiner tree bound is concerned with the connectivity requirement in order to complete the tour using a least cost superset of essential edges.

References:

M. Charikar, B. Raghavachari: The Finite Capacity Dial-a-Ride Problem, 2001.

On the Dial-a-Ride problem

Dial-a-Ride problem: Given an undirected graph G = (V,E) with a non-negative length function l : E \rightarrow \mathbb{R}^{+}, a set of m ordered source-destination pairs (demand pairs) \{(s_i, t_i)\}_{i=1}^{m} \subset V \times V (each requiring an item to be carried from s_i to t_i), and a vehicle with capacity k, find a minimum length tour which satisfies all demand pairs such that the number of items carried by the vehicle during any point in the tour does not exceed k.

  • The problem definition may or may not specify a particular starting vertex for the tour.
  • Dial-a-Ride is a restricted version of k-delivery TSP. k-delivery TSP allows an item from any particular source to be carried to any of the specified destinations, and hence generalizes Dial-a-Ride.
  • Dial-a-Ride generalizes TSP, and hence is NP-hard. Consider a TSP instance, G=(V,E). Mark any one of the vertices in V as the source vertex, and mark the remaining vertices |V|-1 vertices as destination vertices, to obtain |V|-1 demand pairs. Set the capacity of the Dial-a-Ride tour vehicle to be arbitrarily high (infinite), and set the starting vertex as the one selected to be the source. A Dial-a-Ride tour on the reduced instance corresponds to an optimal TSP tour on the original instance.
  • There are two versions of the Dial-a-Ride problem. The preemptive version allows an item to be dropped off by the vehicle at an intermediate location (different from its destination). The non-preemptive version is more restrictive and does not allow this freedom.
  • Metric version: Here the underlying graph is an n-vertex metric space (V,d).
  • Dial-a-Ride on the tree metric: The source and destination points are assumed to be on the leaves. Apart from this restriction, the problem statement remains the same as before.

On the TSP and related problems – 2

3. k-MST: (general (as opposed to metric) version) :- Given an undirected graph G= (V,E), with non-negative edge lengths l : E \rightarrow \mathbb{R}^{+}, find a tree of minimum length that spans at least k vertices.

Notes on the k-MST problem:

  • There are two versions of the k-MST problem. The rooted version specifies a root vertex, r, which must necessarily be included in the tree. The unrooted version does not place any such restriction.
  • An \alpha-approximation algorithm for unrooted k-MST gives an \alpha-approximation algorithm for rooted k-MST. Consider an instance of rooted k-MST. Add |V| additional vertices to G, and an edge of zero length between the root and each of them. Clearly, an optimal solution of the unrooted version requiring the tree to span at least |V| + k vertices in the reduced instance, is an optimal solution for the rooted version for k vertices. Hence, an \alpha-approximation algorithm for unrooted k-MST implies an \alpha-approximation algorithm for rooted k-MST.
  • An \alpha-approximation algorithm for rooted k-MST gives an \alpha-approximation algorithm for unrooted k-MST. Consider an instance of unrooted k-MST. Run the \alpha-approximation algorithm for rooted k-MST on all vertices. The least cost solution among these gives an \alpha-approximate solution to the unrooted version. ( The least cost optimal solution over all vertices (as roots) for the rooted version is an optimal solution for the unrooted version. Hence, the least cost \alpha-approximate solution over all vertices for the rooted version gives a solution that is within \alpha-factor of the optimal for the unrooted version.)
  • Budget version of k-MST: Given an undirected graph G= (V,E), a non-negative length function l : E \rightarrow \mathbb{R}^{+}, and a budget B, find a tree that spans the maximum number of vertices subject to the condition that the length of the tree is at most the upper bound B.
  • Garg [2005] gave a 2-approximation algorithm for k-MST.
  • Metric version of k-MST: Given an n-vertex metric space, (V,d), and a number k, find a minimum length tree that spans at least k vertices.

4. k-forest: Given an undirected graph G= (V,E) with non-negative length function l : E \rightarrow \mathbb{R}^{+}, a set of m demand pairs \{s_i, t_i\}_{i=1}^{m} \subset V \times V, and a number k, find a least cost subgraph that connects (satisfies) at least k demand pairs.

  • Rooted k-MST reduces to k-forest. (k-forest generalizes rooted k-MST.) Consider a rooted k-MST instance on G= (V,E) with root r. Fix |V| - 1 demand pairs \{r,v\} where v \in V - \{r\}. An optimal solution satisfying k-1 demand pairs in this reduced instance is also an optimal solution for rooted k-MST on the original instance. Also, clearly, an \alpha-approximation algorithm for k-forest gives an \alpha-approximation algorithm for rooted k-MST.
  • Unrooted k-MST reduces in polynomial time to k-forest. This follows from the reductions from the unrooted k-MST to rooted k-MST, and from rooted k-MST to k-forest. Again, an \alpha-approximation algorithm for k-forest gives an \alpha-approximation algorithm for unrooted k-MST.
  • Metric k-forest: Given an n-vertex metric space (V,d), a set of m demand pairs \{s_i, t_i\}_{i=1}^{m} \subset V \times V, and a number k, find a least cost subgraph satisfying at least k demand pairs.
  • Metric versions of rooted and unrooted k-MST reduce in polynomial time to metric k-forest.

References:

  • Naveen Garg: Saving an epsilon: A 2-approximation for the k-MST problem in graphs, 2005.

Notes on the TSP and related problems

Here are the problem definitions and some notes on the metric versions of the Traveling Salesman Problem and related problems.

A metric space is defined as an ordered pair (M,d), where M is a set and d: M \times M \rightarrow \mathbb{R} is a distance function satisfying the following properties:

  • d(x,y) \geq 0 \forall x, y \in M.
  • d(x,y) = 0 iff x = y.
  • d(x,y) = d(y,x).
  • d(x,y) + d(y,z) \geq d(x,z).

1. TSP: Given a n-vertex metric space (V,d) and m points in V, find a minimum length tour that covers all m points.

Notes on the TSP:

  • No point that is not one of the specified m points will be visited by the optimal TSP tour. This follows directly from the triangle inequality (visiting a point, v that is not one of the m specified points, can be short cut by bypassing v to obtain a tour that is shorter than one that visits v).
  • The optimal tour is a cycle, i.e. no point is visited more than once. This, again, follows from the triangle inequality.
  • TSP is NP-hard.

2. k-delivery TSP: Given an n-vertex metric space (V,d), a Source set S \subset V, a Destination set T \subset V, and a number k, such that S \cap T = \phi, \sum_{s \in S} x_s = m and \sum_{t \in T} y_t = m, where x_s represents the number of items available (i.e. capacity) at source point s, and y_t represents the requirement at destination point, t, find a minimum length tour of a vehicle that transports all m items from source points to the destination points such that the vehicle carries at most k items at all times during the tour.

Notes on the k-delivery TSP:

  • Implicit in the problem statement are the assumptions that all source points in S are identical, and that all destination points in T are identical, and that all items are identical. In other words, any item can be transported from any source to any destination.
  • The problem can require the tour to start at a particular point. Intuitively, it seems that, unlike in the case of metric TSP, having a fixed starting point can result in a longer tour than what may be obtained given the freedom to choose any starting point.
  • There are two versions of this problem. The preemptive version allows the vehicle to drop off an item at an intermediate point that is not a destination point. The non-preemptive version does not allow this freedom.
  • For the non-preemptive version of the problem, the optimal tour would involve only the points in S \cup T. This follows from the triangle inequality.
  • k-delivery TSP generalizes TSP and hence is NP-hard.
    Consider a TSP instance requiring a least cost tour covering m specified points. Choose an arbitrary point, say r,  among the specified points, designate it as a source point with capacity m-1, and designate the remaining m-1 points as destinations with unit requirement. An optimal k-delivery TSP tour for this instance corresponds to an optimal TSP tour.
  • An application of k-delivery TSP: Suppose that Coke has a certain number of storehouses in a particular town, from where it needs to transport cartons of bottles to stores across the town. A carton can go from any storehouse to any store, and there is only one vehicle for carrying the cartons, and it can carry at most a certain number of cartons at any time.

A beautiful proof

There are some proofs, for which the very fact of being acquainted with the working of the mind which came up with such beautiful ideas can transport you to a state of ecstasy.

Georg Cantor (1845-1918) gave a beautifully simple proof for proving that the power set of the set of natural numbers (N) has “more” elements than the set N itself, i.e. 2^N is uncountably infinite.

Assume to the contrary that 2^N is countably infinite. In other words, there exists a bijection between the set N and the set 2^N. (This is what is meant by a set S being countably infinite, i.e. if a set S is countably infinite, then there exists a bijection between S and the set, N of natural numbers.)

2^N = {R_0, R_1, R_2, … }  [R_0, R_1 etc are elements of 2^N i.e. subsets of N.]

Assuming that 2^N is countably infinite implies that every element of 2^N is equal to R_i for some i \in N.

Now, consider this set. Call it D.

If  0 \in R_0, then 0 \notin D.
Else if, 0 \notin R_0, then 0 \in D.

Similarly, deciding whether or not each i \in N belongs to D, is done based on whether or not i belongs to R_i.

If R_i contains i, then we do not include i in D.

And if R_i does not contain i, then we include i in D.

What do we get now?

We get a set D that is different from each R_i. (It is different from R_5 in whether or not ’5′ is contained in it; it is different from R_345 in whether or not ’345′ is contained in it. Hence, D can not be equal to set R_5. And, D can not be equal to set R_345.)

So, we see that D can not be equal to any of the sets R_i.

But, D is also a subset of N.

Hence, D should ideally be an element of the power set of N. By assumption, 2^N = {R_i : i \in N}.

We just saw that D can not be equal to any R_i.

Hence D is NOT a part of 2^N.

This is a contradiction!

We are done. 2^N can not have a bijection with N. 2^N is uncountably infinite.

Extended Longest Path problem

These are slides from a talk on the Longest Path Problem at the Penn State Theory seminar.

Talk_longest path

In the Longest Path Problem, we are given an undirected graph, G =(V,E), and integer k \geq 0 , and are asked to find out whether or not there exists a simple path of length k in G.

The Extended Longest Path Problem is a variant of the above. Here, we are asked to find out all pairs of vertices that have a simple path of length k between them.

Read more of this post

Longest Path: Method of Random Orientations

In this post, we shall consider an NP-complete problem, namely that of determining whether a simple path of length k exists in a given graph, G = (V,E).

Formally,

LONGEST PATH PROBLEM

Input: undirected graph, G = (V,E), integer k \geq 0 .

Parameter: k

To find: Does G contain a simple path of length, k?

Read more of this post

Vertex Cover and Set Cover – 4

Question: We are given an instance of the Vertex Cover problem. We are also given an algorithm for the Set Cover problem. Using this algorithm as a black box, solve the given instance of the Vertex Cover problem.

Solution:

Given instance of Vertex Cover problem: undirected graph, G = (V, E); positive integer k.
Question: Does G have a vertex cover of size at most k?

===

Read more of this post

Vertex Cover and Set Cover – 3

Consider the Vertex Cover problem.

To recapitulate:
Given: Undirected graph, G = (V, E), positive integer, k.
Question: Does G have a Vertex Cover of size at most k?

Now, we shall look at a problem related to the above question.

How do we verify the validity of a answer to the Vertex Cover problem? Read more of this post

Vertex Cover and Set Cover – 2

Having looked briefly at the Vertex Cover problem, here we shall define the Set Cover problem.

As the name suggests, we have a set, and we need to “cover” something.

Consider the following example:

We have a set of cities, S = {Madras, Delhi, Pilani, Los Angeles, London, Paris}.

Read more of this post

Vertex Cover and Set Cover – 1

Here we will look at two very interesting problems. One is known as the Vertex Cover Problem, and the other as the Set Cover Problem. Both are quite simple to define, and visualize (and equally difficult to solve efficiently :) ).

Let us consider the Vertex Cover Problem first. As the name suggests, we have to “cover” something. We are given an undirected graph, G = (V, E). We need to select a set of vertices from V, so that all edges are “covered” by this set of vertices.

What exactly do we mean by covering? A vertex “covers” all edges that are incident upon it. This does seem to be quite a natural definition.

Read more of this post

Multiway cut – 2

In the previous post on this topic, we saw an algorithm with an approximation guarantee of 2 for the multiway cut problem. Some guidelines were also given on developing a factor (2-2/k) approximation algorithm based on the ideas of the original algorithm.

Let us briefly review that.

Problem: Given a set, S = {s_1, s_2, ..., s_k} of k vertices, find a cut of minimum size (in terms of weights of edges) that separates each s_i from all other vertices in S.

Algorithm:

1. Find an isolating cut for each s_i . Call each such isolating cut C_i .

2. Output the union of the k-1 lightest isolating cuts. (i.e. exclude the heaviest of the C_i's and output the union of the rest.)

The above set of edges is a multiway cut.

We saw how the set of edges C_1 \cup C_2 \cup ... \cup C_k formed a 2 – factor approximation to the optimal cut. Using this fact, convince yourself that the above algorithm outputs a (2-2/k) factor approximation of the optimum.

Another multiway cut algorithm:

We now present a mulitway cut algorithm that performs no worse than the above (2-2/k) factor algorithm, and indeed performs better than the above in examples which we will see.

The algorithm inherits the idea of finding an isolating cut for each vertex in the set, S. Then, it performs a greedy choice, and picks the isolating cut, C_i having minimum size. It then removes the edges of cut, C_i from the graph, G to get a new graph G’. The vertex s_i is removed from the set, S. The same procedure is repeated on this new graph. (i.e. the isolating cuts are again computed for each of the k-1 vertices left in S, and the minimum cut is selected for removal from the graph.)

Once the above procedure is repeated k-1 times, we get the required multiway cut, which has an approximation guarantee of (2-2/k), and which additionally performs better than the above (2-2/k) factor algorithm in specific test cases.

The algorithm is repeated below step-wise.

1. Let G_1 = G. and S_1 = S.

Initially, i = 1. The required multiway cut, C is initially the Null Set.

Repeat the following (Steps 2-7) k-1 times.

2. Compute isolating cuts (wrt vertices in S_i ) – C_i for each vertex in S_i.

3. Let C_j be the minimum of the cuts computed in the previous step.

4. Remove the edges of C_j from G_i to get a new graph, G_{i+1} .

5. Add the edges of C_j to C. i.e. C = C \cup C_j.

6. Remove vertex, s_j from S_i to get a new set, S_{i+1}.

7. Increment i by one.

End of algorithm.

The set, C is the required multiway cut.

—–

Things to think about:

1. Convince yourself that k-1 iterations are sufficient to generate a multiway cut. (i.e. after k-1 iterations, each of the vertices in set, S is disconnected from all other vertices in S.)

2. Can you see why this algorithm will generate a cut that is no worse than the cut produced by the earlier (2-2/k) factor algorithm.

At each step , we remove a particular vertex from the set, S, and recompute the values of the isolating cuts for the remaining vertices, based on this new S. Can you see why this could lead to a better isolating cut for a particular vertex in S than was possible under the earlier algorithm.

Let us consider an example:

multiway_alg2

Example for multiway cut: the edge weights are indicated alongside the edges.

In the above example:

The minimum isolating cut for s_3 has value = 20. (One such candidate for C_3 is the set of edges { {s2,s3), {s3,C}}.
The minimum isolating cut for s_1 has value = 15. There can be more than one such cut with value = 15. Let the choose this one: {{A,B}, {A,C}}.
Similarly, value of minimum isolating cut for s_2 is 15; and one such cut is given by: {{s2,B}, {s2,s3}}.

Now, how would the first algorithm proceed?

The union of the 2 lightest isolating cuts is outputted. Hence, the multiway cut produced by the algorithm is: {{A,B}, {A,C}, {s2,B}, {s2,s3}}. The value of this cut is 30.

How does the 2nd algorithm perform in this example?

Take a minimum value cut from the set of isolating cuts and add it to C. We can add either C_1 or C_2 since both have equal minimum value. Suppose, we add C_1 to C. Now, C = {{A,B}, {A,C}}.
Remove edges of C_1 from G, and recompute minimum isolating cuts for s_2 and s_3 on the new set, S = {s2, s3}.
We get a minimum isolating cut of value = 10 for both s_2 and s_3 . This is given by the edge {s2,s3}.
Add this edge to C, and we are done.

Therefore, C = {{A,B}, {A,C}, {s2,s3}}.

Value of this multiway cut = 25.

Hence, we can see how the 2nd algorithm performs better than the first in this example.

On a separate note, convince yourself that the second algorithm will perform no worse than the first on any input graph.

———

Another example (here the second performs asymptotically twice as better as the first):

Example for multiway cut

Example for multiway cut

Note: In the above diagram, the letter e has been used to denote \epsilon > 0 .

The set, S consists of k vertices, s_1 through s_k as shown. (The diagram is for k=5)

Performance of first algorithm on above instance:
Value of C_1 = 2 - 2 \epsilon .
For all other vertices in S, value of C_i = 2 - \epsilon.

Therefore, the value of cut produced by algorithm = (k-2) \times (2-\epsilon) + (2 - 2 \epsilon).

Performance of second algorithm on above instance:
Value of minimum C_i is 2- 2 \epsilon given by C_1 . On removing C_1 from G and s_1 from S, we find that the values of all other isolating cuts become equal to one. (See the diagram.) We proceed through the algorithm as specified to get cut, C.

Value of cut, C thus produced = (k-2) \times 1 + (2 - 2 \epsilon) .

Verify that for large values of k, as \epsilon —> 0, the ratio of values of the two cuts approaches 2.

Hence, the second algorithm performs twice as better asymptotically in this case.

3. Consider this example:

A tight example for both algorithms.

A tight example for both algorithms.

(Note: The letter, e has been used in the diagram to represent, \epsilon >0 .

Verify for yourself the following:

1. The optimal multiway cut for the above instance is of value = k. (Consider removing all edges of the cycle.)

2. The first algorithm produces a cut of value = (k-1) \times (2 - \epsilon) .

3. The second algorithm produces a cut of value = (k-1) \times (2 - \epsilon).

4. Asymptotically, both algorithms achieve an approximation factor of (2-2/k).

(I’m sorry for any errors in this post.)

Multiway Cut

A cut, C in a graph, G= (V, E) is a set of edges, whose removal from G disconnects G. i.e. G’ = (V, E – C) is disconnected. (A graph is said to be connected if there exists a path between any two vertices in the graph. Hence, a graph is said to be disconnected if there exist two vertices in the graph, say u and v, such that there exists no path between u and v in the graph.)

An s-t cut, C is a set of edges whose removal from the graph, disconnects s and t.

(Note: Throughout this discussion, we shall talk only about undirected, edge-weighted, connected graphs. Also, all edge weights are integers.)

Size of a cut: This means the total weight of all edges forming the cut.

We want to disconnect the graph; but we want to do so by removing as few (in terms of weight) edges as possible. That, we wish to find a cut of minimum size. Such a cut is called a min – cut.

Example for min – cut:

Min Cut example: All edges in the graph have weight = 20, other than the 3 highlighted edges. Those 3 edges form the min cut.

Min Cut example: All edges in the graph have weight = 20, other than the 3 highlighted edges. Those 3 edges form the min cut.

An s-t min cut, as is clear from the name, is a cut of minimum size that separates vertex s from vertex t.

Example for s- t min cut:

Example for s-t min cut

Example for s-t min cut

(The highlighted edges form an s-t min cut of value = 40. Note that the min cut value for this graph is 5 (obtained by removing edge {a,b}.)

=========

A multiway cut is a generalization of an s-t cut.

Here, instead of two vertices, s and t, we have to separate a set of k vertices, {s_1, s_2, ..., s_k} from each other.

In other words, find a set of edges, say C, such that on removing C from the graph, there exists no path between s_i and s_j for any i, j \in {1,2, ..., k} .

Our task is to find a multiway cut of minimum size.

Example of multiway cut:

By removing all edges shown in the figure, we can separate vertices, s1, s2, .., sk, from one another.

By removing all edges shown in the figure, we can separate vertices, s1, s2, .., sk, from one another. (Note that within each ellipse around each si, there may be a number of vertices and edges between those vertices. They have not been shown for sake of clarity.)

(The above example shows a multiway cut.)

========

Question: How do you find a multiway cut of minimum size?

It turns out that it is hard to find such a cut in polynomial time. But that’s not the end of the story. What we can do efficiently (in general, an algorithm is efficient if it runs in polynomial time) is to find a multiway cut that is within a factor of 2 from the optimal. In other words, if the value of the optimal multiway cut of a graph, G is OPT (G) , our algorithm will produce a cut of value \leq 2 \times OPT (G) for every graph, G.

(So, even though we don’t find an optimal cut, we find one that is guaranteed to be “close” to optimal.)

========

Even though, we do not know of any efficient algorithm, for solving a multiway cut problem exactly for arbitrary values of k, we do know how to solve it efficiently for k = 2. (Note that for k = 2, the multiway cut problem is just the s-t cut problem. An optimal s-t cut can be found efficiently. [It is actually quite simple to do it; but that is not our concern here.] )

We will use this efficient s-t min cut algorithm to find our approximate multiway cut.

————–

Let us review once again what we want. Let us denote by S the set of k vertices, {s_1, s_2, ..., s_k} .

We want to remove a set of edges, such that on their removal:
- there exists no path from s_1 to s_2 , s_1 to s_3 , and so on till s_1 to s_k .
- there exists no path from s_2 to s_1 , s_2 to s_3 , and so on till s_2 to s_k .
- there exists no path from s_3 to any of the other vertices in S.
- and so on, for each vertex in the set, S.

The above explanation of the problem actually provides us with a straightforward algorithm for the same.

Take vertex, s_1 . Find a set of edges, removing which would disconnect s_1 from every other vertex in S.
Let C_1 be such a set of minimum size (i.e. minimum total weight).

How do we find C_1? Simple; we want a min cut that separates s_1 from all other vertices of S. What we’ll do is merge all these other vertices of S into one single vertex, call it t. Find an s-t min cut with s_1 as the vertex, s, and the merged vertex as t.

(Merging: When we merge any two or more vertices, the edges between these merged vertices disappear. The other edges that were incident upon these vertices, remain as such, and are now incident upon the new merged vertex.

Example for merging vertices:

The graph on the right is obtained by merging vertices, C, D, E and F into one "super-vertex" denoted by {C,D,E,F}. Note that there are multiple edges between the "super-vertex" and some other vertices of the graph.

The graph on the right is obtained by merging vertices, C, D, E and F into one "super-vertex" denoted by {C,D,E,F}. Note that there are multiple edges between the "super-vertex" and some other vertices of the graph.

(Each of the edges {F,G} and {E,G} translates into an edge between the super-vertex and G. Hence, there are 2 edges between {C,D,E,F} and G.)

Therefore, using the efficient algorithm for finding an s-t min cut, we can compute C_1 . On removing the edges of C_1 , vertex s_1 will be disconnected from all other vertices in S.

So, the first part of our job is done (disconnecting s_1 from all others in S.)

(Note: We call C_1 as an “isolating cut” for vertex, s_1 .)

Similarly, find isolating cut C_2 corresponding to vertex s_2 . On removing the edges of C_2 , s_2 is disconnected from every other vertex of S.

Therefore, by removing the union of C_1 and C_2 , we can disconnect s_1 and s_2 from every other vertex in S (and obviously, from each other also).

Now, you get an idea of how to proceed, right?

Find an isolating cut C_i corresponding to each vertex, s_i in the set, S.

The union of all these isolating cuts, i.e. C_1 \cup C_2 \cup ... \cup C_k is a multiway cut separating s_1, s_2, ..., s_k from one another.

=========

Question: We had claimed that the cut produced by the algorithm is always within a multiplicative factor of 2 within the optimal. How do we prove this?

Let us call the cut produced by the above algorithm as C. i.e. C = C_1 \cup C_2 \cup ... \cup C_k .

Let us call the optimal cut as A.

Now, on removing the edges of A from G, we would get k (disjoint) connected components in the graph, one connected component for each vertex, s_i .

Let A_1 be the subset of edges from A, that is incident upon the connected component of s_1 .

Graph showing A1. (A1 is the set of dashed edges.) (Note that within the connected components of vertices other than s1 have not been shown. Also, the edges between the various connected components other than the one containing s1 have not been shown, for sake of clarity.)

Graph showing A1. (A1 is the set of dashed edges.) (Note that vertices and edges within the connected components of vertices other than s1 have not been shown. Also, the edges between the various connected components other than the one containing s1 have not been shown, for sake of clarity.)

As is quite obvious, A = A_1 \cup A_2 \cup ... \cup A_k .

It is also clear that each edge of A appears in exactly two A_i 's . (If an edge of A is between the connected component of vertex s_i and s_j , then it appears in the sets A_i and A_j . )

From this, we have that:

\sum_{i=1}^{k} w(A_i) = 2 \times w(A) .  —————————Eqn. (1)

Here, w(X) denotes the total weight of edges in set, X.

Now, we know that C_1 is an isolating cut of minimum size for s_1 .

Therefore, w(C_1) \leq w(A_1) .

Similarly, w(C_i) \leq w(A_i) for all i \in {1, 2, ..., k} .

Therefore, \sum_{i=1}^{k} C_i \leq \sum_{i=1}^{k} A_i .

Combining the above equation with Eqn. (1), we get that:-

\sum_{i=1}^{k} C_i \leq 2 \times w(A) .

i.e. the cut produced by our algorithm is always within a factor of 2 of the optimal multiway cut.

In other words, we are done. :)


Points to ponder:

1. Do we necessarily need to take the union of all k sets C_1 till C_k to obtain a multiway cut? What if we take the union of only k-1 of these sets? Will that union be a valid multiway cut?

2. As an extension of the above point: which k-1 sets should we choose in order to minimize the size of the cut produced? (Hint: you can discard the heaviest among the C_i 's and output the union of the rest.) We know that outputting the union of all k sets results in a factor-2 approximation algorithm. What is the approximation factor for this improved algorithm? (Hint: Prove that the approximation factor is (2 – 2/k). )

(The topic of multiway cut is dealt in Chap. 4 of “Approximation Algorithms” by Vijay Vazirani.)

Life of Srinivasa Ramanujan

It is one of the most romantic stories in the history of mathematics: in 1913, the English mathematician G. H. Hardy received a strange letter from an unknown clerk in Madras, India. The ten-page letter contained about 120 statements of theorems on infinite series, improper integrals, continued fractions, and number theory (Here is a .dvi file with a sample of these results). Every prominent mathematician gets letters from cranks, and at first glance Hardy no doubt put this letter in that class. But something about the formulas made him take a second look, and show it to his collaborator J. E. Littlewood. After a few hours, they concluded that the results “must be true because, if they were not true, no one would have had the imagination to invent them”.

Thus was Srinivasa Ramanujan (1887-1920) introduced to the mathematical world. Born in South India, Ramanujan was a promising student, winning academic prizes in high school. But at age 16 his life took a decisive turn after he obtained a book titled A Synopsis of Elementary Results in Pure and Applied Mathematics. The book was simply a compilation of thousands of mathematical results, most set down with little or no indication of proof. It was in no sense a mathematical classic; rather, it was written as an aid to coaching English mathematics students facing the notoriously difficult Tripos examination, which involved a great deal of wholesale memorization. But in Ramanujan it inspired a burst of feverish mathematical activity, as he worked through the book’s results and beyond. Unfortunately, his total immersion in mathematics was disastrous for Ramanujan’s academic career: ignoring all his other subjects, he repeatedly failed his college exams.

As a college dropout from a poor family, Ramanujan’s position was precarious. He lived off the charity of friends, filling notebooks with mathematical discoveries and seeking patrons to support his work. Finally he met with modest success when the Indian mathematician Ramachandra Rao provided him with first a modest subsidy, and later a clerkship at the Madras Port Trust. During this period Ramanujan had his first paper published, a 17-page work on Bernoulli numbers that appeared in 1911 in the Journal of the Indian Mathematical Society. Still no one was quite sure if Ramanujan was a real genius or a crank. With the encouragement of friends, he wrote to mathematicians in Cambridge seeking validation of his work. Twice he wrote with no response; on the third try, he found Hardy.

Hardy wrote enthusiastically back to Ramanujan, and Hardy’s stamp of approval improved Ramanujan’s status almost immediately. Ramanujan was named a research scholar at the University of Madras, receiving double his clerk’s salary and required only to submit quarterly reports on his work. But Hardy was determined that Ramanujan be brought to England. Ramanujan’s mother resisted at first–high-caste Indians shunned travel to foreign lands–but finally gave in, ostensibly after a vision. In March 1914, Ramanujan boarded a steamer for England.

Ramanujan’s arrival at Cambridge was the beginning of a very successful five-year collaboration with Hardy. In some ways the two made an odd pair: Hardy was a great exponent of rigor in analysis, while Ramanujan’s results were (as Hardy put it) “arrived at by a process of mingled argument, intuition, and induction, of which he was entirely unable to give any coherent account”. Hardy did his best to fill in the gaps in Ramanujan’s education without discouraging him. He was amazed by Ramanujan’s uncanny formal intuition in manipulating infinite series, continued fractions, and the like: “I have never met his equal, and can compare him only with Euler or Jacobi.”

One remarkable result of the Hardy-Ramanujan collaboration was a formula for the number p(n) of partitions of a number n. A partition of a positive integer n is just an expression for n as a sum of positive integers, regardless of order. Thus p(4) = 5 because 4 can be written as 1+1+1+1, 1+1+2, 2+2, 1+3, or 4. The problem of finding p(n) was studied by Euler, who found a formula for the generating function of p(n) (that is, for the infinite series whose nth term is p(n)xn). While this allows one to calculate p(n) recursively, it doesn’t lead to an explicit formula. Hardy and Ramanujan came up with such a formula (though they only proved it works asymptotically; Rademacher proved it gives the exact value of p(n)).

Ramanujan’s years in England were mathematically productive, and he gained the recognition he hoped for. Cambridge granted him a Bachelor of Science degree “by research” in 1916, and he was elected a Fellow of the Royal Society (the first Indian to be so honored) in 1918. But the alien climate and culture took a toll on his health. Ramanujan had always lived in a tropical climate and had his mother or his wife to cook for him: now he faced the English winter, and he had to do all his own cooking to adhere to his caste’s strict dietary rules. Wartime shortages only made things worse. In 1917 he was hospitalized, his doctors fearing for his life. By late 1918 his health had improved; he returned to India in 1919. But his health failed again, and he died the next year.

Besides his published work, Ramanujan left behind several notebooks, which have been the object of much study. The English mathematician G. N. Watson wrote a long series of papers about them. More recently the American mathematician Bruce C. Berndt has written a multi-volume study of the notebooks. In 1997 The Ramanujan Journal was launched to publish work “in areas of mathematics influenced by Ramanujan”.

(This biography of this awe-inspiring genius is taken from here.)

Interesting questions

Here are two simple and interesting ideas.

1. A group of ‘n’ persons are in a classroom. Prove that there always exist two people having the same number of friends.

(Note that if ‘a’ is a friend of ‘b’, then ‘b’ is also a friend of ‘a’.)

This can be easily proved using a principle known as the Pigeon Hole Principle.

This basically states that if we have larger number of pigeons and a lesser number of holes, then there will be at least one hole with more than one pigeon. Quite simple.

Returning to the question, assume there is at least one person in that group who is a friend of everyone else. This implies that there does not exist any person having zero friends. (Since every person has at least one friend- the one who is the friend of everyone else.)

So, what are the possible values for the number of friends any person in the group can have?

1, 2, 3, …., n-1. (As we observed above, value zero is not possible here. )

Now, what are the total number of persons in the group?

n.

So, we have ‘n’ persons, to each of whom we need to assign a value (number of friends that person has). We can assign values from a set of only ‘n-1′ possible values.

Hence, there will be at least two persons having the same number of friends, for the case that there is at least one person who is a friend of everyone else.

Now, what if there is no person having ‘n-1′ friends?

In this case, the set of values indicating the possible number of friends a person can have, does not include value ‘n-1′.

(The set of values is: {0, 1, 2, …, n-2} )

The same argument as above applies here. Hence proved.  QED

Point to ponder:

It was stated earlier that: If ‘a’ is a friend of ‘b’, then ‘b’ is also a friend of ‘a’. Question: Where has that fact been used in the above proof?

2. A group of ‘n’ friends live in a hostel. Let us name the friends as 1, 2, 3, …, n. Each one has a room assigned to him/her. (Person 1 is assigned Room 1; Person 2 is assigned Room 2, and so on.) There are ‘n’ rooms in all- one for each person.

One day, on coming back to the hostel after a trip, they decide to randomly enter whichever room they wish to. For instance, Person 1 enters Room 5; Person 2 enters Room 3; and so on (keeping in mind that only one person can enter any given room.)

Question: What is the expected number of persons who entered their assigned rooms?

Randomly entering a room (keeping in mind that only one person enters any given room) corresponds to a random permutation of  1, 2, …, n.

In other words, randomly re-order the sequence 1, 2, …, n.

For example (n= 5 ), we get 3, 5, 1, 4, 2. Here , Person 3 enters Room 1; Person 5 enters Room 2; and so on.

Now, what is the probability that Person 1 enters Room 1?

Answer: 1/n. (Since he has ‘n’ rooms to choose from.)

Let us define a few random variables which will help us in computing the expected value.

Let X_i be a random variable that has value = 1 if Person ‘i’ enters Room ‘i’, and value = 0, if Person ‘i’ does not enter Room ‘i’.

We need to find the expected number of persons in their assigned rooms. This is the same as: \Sigma _{i=1} ^n E[X_i] .

Let us compute E[X_i] .

E[X_i]  =      Pr[Person 'i' enters Room 'i'] x [Value of X_i for this event]  +  Pr[Person 'i' does not enter Room 'i']  x [Value of  X_i for this event]

= (1/n) x 1 + (n-1)/n x 0

= 1/n.

(Notation: E[X] denotes Expected value of random variable X; Pr[e] denotes probability that event ‘e’ occurs.)

Now, we will use a technique called “Linearity of Expectation. What this says is that if we have two random variables, X and Y, then E[X + Y] = E[X] + E[Y]. (This can be extended to any number of random variables, not just two.)

(Now, if X and Y are independent random variables (i.e. Y’s value does not depend on what the value of X is; and vice versa), then it is intuitively clear that the expectation of the sum should equal the sum of expectations. But the powerful idea behind the principle of Linearity of Expectation is that this holds even when the random variables are not independent.)

Coming back to the question:-

We can apply Linearity of Expectation to solve the problem.

E[X_1 + X_2 + ... + X_n] = E[X_1] + E[X_2] + ... + E[X_n] .

Therefore, E[X_1 + X_2 + ... + X_n] = 1/n + 1/n + … + 1/n = 1

In other words, the expected number of persons who will enter their assigned rooms is one.

——–

As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality. – Albert Einstein

——–

How wonderful that we have met with a paradox. Now we have some hope of making progress. -Niels Bohr


Bipartite Matching: An application

Firstly, we define the concept of a matching.

Suppose we are given a graph, G = (V, E). As the name suggests, a matching means a pairing of vertices of the graph. In other words, each vertex is “matched” to exactly one other vertex in the graph to which it had been connected by an edge in E. We say that such edges, which “match” two vertices with one another, are part of the matching.

Formally, a matching is a set of edges, S \subseteq E such that no two edges in S are adjacent to each other. (Two edges are adjacent if share a common end-vertex.)

Example:

Matching Example

Matching Example : Bold edges {(1,2), (4,5)} form a matching.

Maximum matching:

A matching of maximum cardinality is called a maximum matching. (Note that a graph may have more than one matching of maximum cardinality.)

Example:

{ (1,2), (3,4), (5,6) } form a maximum matching

{ (1,2), (3,4), (5,6) } form a maximum matching

Clearly, we can not get a matching having more than 3 edges for the above graph. (There are only 6 vertices; for each pair of vertices there can be at most one edge in the matching; hence there can not be a matching with greater than 3 edges.) Hence, this is a maximum matching.

Perfect Matching:

We were able to pair up all vertices of G in the above example. Such a matching is called a perfect matching.

A perfect matching may not exist for many graphs.

Example:

Maximum matching is of size one. Two vertices are unpaired in maximum matching.

Maximum matching is of size one. Two vertices are unpaired in maximum matching.

If the graph we consider is a bipartite graph, then the matching in such a graph is termed as a bipartite matching.

Practical application of bipartite matching: Job recruitment process

Suppose there are ‘n’ companies competing to hire students from a university, and that ‘m’ students are available for hiring. Assume that each company has only one job opening, and hence can hire at most one student.

After the tests and interviews, each company shortlists certain candidates. The company sees no distinction among its shortlisted candidates, and is willing to hire any one of them to fill its opening.

It is clear how this is a problem of bipartite matching.

Let L be the set of companies, and R be the set of students.

An edge exists between, l \in L and r \in R if r is one of the candidates shortlisted by the company.

Each company can hire at most one (by our assumption; either it hires a student, or goes home empty-handed.)

Quite obviously, each student can work in at most one company (either the student is selected by one company; or goes home empty-handed.)

Hence, this is a clear case of bipartite matching between the set of companies and the set of students.

The university would certainly like to find out the maximum number of students who can get placed through this recruitment process.

In other words, the university wishes to find out the size of the maximum bipartite matching possible for the company-student graph.

There exist polynomial time algorithms for computing a maximum bipartite matching. Hence, the problem can be solved by running any of those algorithms on the given instance of the company-student graph.

Points to ponder:

1. When does a bipartite graph have a perfect matching? What is the common structure of such bipartite graphs, which enable them to have a perfect matching? In particular, there is a very interesting theorem known as Hall’s Marriage Theorem. The theorem has two parts to it: an “if” part, and an “only if” part. One of these is simple to visualize, while the other is somewhat non-intuitive. You could read up on that.

2. There is a concept of maximal matching.

A maximal matching of a graph G = (V, E) is a matching to which no other edge of E can be added without violating the matching property. (Matching property requires that all edges of a matching be non-adjacent.)

Example:

Maximal matching: No edge can be added to the matching { (1,6), (3,4) } without violating the matching property.

Maximal matching: No edge can be added to the matching { (1,6), (3,4) } without violating the matching property.

The size of this particular maximal matching of G is 2. (Unlike maximum matchings of a graph which are all of the same size, the maximal matchings of a graph can be of different sizes.)

But we can certainly do better than this in terms of getting a matching of higher cardinality.

A Maximum matching for G: { (1,2), (3,4), (5,6) }

A Maximum matching for G: { (1,2), (3,4), (5,6) }

(You can also observe that the above matching is also a perfect matching of G (all vertices are paired).)

Question: Can you find a relation between the size of a maximal matching and the size of a maximum matching. Let l be the size of a particular maximal matching, and let m denote the size of any maximum matching (all maximum matching are, by definition, equal in size).

We know that l \leq m .

Claim: There exists a constant factor,k > 1 , such that m \leq l \times k , for all possible l .

(Notes: How can we get a matching of greater size using a given maximal matching? Clearly, we can not add an edge to it (by definition of maximality, the resultant set will not be a matching.)  So, we need to remove an edge from the maximal matching. Suppose we remove a particular edge, (u,v) . We are, in effect, freeing two vertices, u and v , which are available to be paired up with some other free vertices. Hence, on removing one particular edge from a maximal matching, we can hope to add at most two new edges in its place. This suggests that the size of a maximum matching is at most twice the size of a maximal matching (i.e. k = 2).

I’m not sure about the correctness of the above explanation, as it omits certain details, and may not be rigorous in terms of exploring all possible alternatives for building a maximum matching. On an informal level, this explanation might suffice.)

Max-Cut Problem

The Maximum Cut Problem asks us to find an edge-cut of maximum cardinality. (We are considering undirected, unweighted graphs.) In other words, we are required to find a partition of the vertex set, V into (X^*, V-X^*) such that the number of edges crossing the partition is maximized.

The algorithm presented here is based on a technique known as local search.

Algorithm:

Take any arbitrary partition of the vertex set, (X, V-X). Find a vertex which when flipped to the other set of the partition would have more of its incident edges crossing the cut.

Flip the vertex to the other set of the partition. Continue searching for such a vertex in this new partition. Continue this procedure generating new partitions until no such vertex satisfying the above condition exists.

—-

Consider this example:

graph4

We are given an arbitrary cut as shown above. Consider vertex, 1. Out of its 3 incident edges, 2 are incident on vertices of S1 (its current set), and one is incident on a vertex of set S2. Hence, by moving vertex, 1 to set, S2, we can improve the cardinality of the cut by one.

This is what we get by moving vertex,1 :-

Cut improved by flipping vertex,1
Cut improved by flipping vertex,1

Observations on the algorithm:

1. Each “flip” of a vertex from one set of the partition to the other increases the size of the cut by at least one.  (If vertex, 1 is incident on x_1 vertices in set S_1 , and x_2 vertices in set S_2 , (and x_1 > x_2 ) then flipping vertex, 1 from S_1 to S_2 would increase the size of the cut by x_1 - x_2. )

2. The above point shows why the algorithm runs in polynomial time. The maximum possible size of a cut of G=(V,E) is |E|. From Observation 1, we can conclude that the algorithm makes at most |E| flips. Further, each flip can be done in polynomial time. Hence, the algorithm is of polynomial time complexity.

3. What is the minimum number of edges that the algorithm returns in its cut? In other words, give a lower bound on the size of the cut returned by the algorithm.

This is simple to compute.

Let (X, V-X) be the cut returned after the termination of the algorithm. (Termination of the algorithm implies that flipping any vertex from one set of the partition to the other would not produce a better cut.)

Consider any vertex, v in this cut. At least half of the edges incident on vertex, v in graph G would be crossing (X, V-X). (since otherwise it would mean that flipping v would improve the cut returned by the algorithm, indicating that the algorithm has not yet terminated i.e. contradiction.)

Therefore, at least half of all edges of G cross the cut returned by the algorithm.

|C| \geq |E|/2

where |C| represents the size of the cut returned by the algorithm.

4. Consider the following example:

For the above graph, let an arbitrary cut be given by:

Arbitrary cut used as starting point for algorithm
Arbitrary cut used as starting point for algorithm

(Edges within sets S1 and S2 have been omitted for clarity.)

If we start our algorithm with this cut as a starting point, we find that we can find no vertex which is suitable for flipping.  ( Consider vertex, a: Flipping it would make it incident  on n/2 vertices of S1. But it is already incident on n/2 vertices on S2. Flipping it will not improve the cut. Same is the case with vertex, b. Similarly flipping any of 1,2,..,n would improve the cut.)

Hence, this is the best cut that the algorithm can produce when provided with this particular cut as a starting point.

Size of cut produced by algorithm = n

But this is certainly not a maximum cut for the graph. Consider the following cut :-

A Max cut for the graph
A Max cut for the graph

Size of above cut = 2n.

This is clearly the maximum cut possible for the graph, since |E| itself is 2n.

Hence, our algorithm produces a cut that is factor of 1/2 of the optimal in this particular instance.

( Please note that the output of the algorithm for a particular input graph varies according to the arbitrary cut that is chosen as its starting point. )

5. The best possible cut can be of size |E|.

From Observation 3, it is known that the algorithm produces a cut of size at least |E| / 2.

Hence, the algorithm is a factor- 1/2 approximation algorithm. (i.e. it produces a cut that is always within a factor of 1/2 of the optimal)

Observation 4 indicates that the factor of 1/2 is indeed a tight bound on the performance of the algorithm through an example where the algorithm performs no better than than a factor of 1/2. (Such examples are known as “tight examples“).

Birthday Paradox

In a group of 23 people, the probability that two or more people will share their birthdays is greater than 50%. For a group of 57 people, this probability increases to 99%.

This may seem counter-intuitive at first sight. There are 365 possible days on which a birthday can occur. So it may seem odd that in a group of 57, there is 99% probability of a shared birthday.

Question: In a group of ‘n’ people, what is the probability that any of those ‘n’ persons shares a birthday with any of the others in the group?

This is very simple.

The answer is simply:

1 – (364/365) (363/365) … ((365 – n + 1 )/365 )

=   365 !  / [ (365^n) (365 - n)! ].

Approximating this function, and plugging in values of n, we get the probability of a group of those many people having a common birthday in their midst.

See http://en.wikipedia.org/wiki/Birthday_paradox for more on this.

On the Shortest Path and the MST Problems

This is a pre-submission copy of my undergraduate thesis “On the Shortest Path and Minimum Spanning Tree Problems”.

On the Shortest Path and Minimum Spanning Tree problems

This is more like a restricted survey on some algorithms for computing shortest paths and MSTs, and is quite elementary.

Vertex cover Problem

Here we’ll see a simple algorithm that finds a near-optimal vertex cover of a graph.

Firstly, what is a vertex cover? A vertex cover is a set of vertices, call it S, such that the following condition holds: every edge of the graph has at least one end-point in S.

( i.e. we are “covering” all the edges of the graph through the vertex cover.)

Example: If vertex, x is adjacent to vertices, a, b, and c, then it “covers” edges {x,a}, {x,b} and  {x,c}.

What we ideally want is a vertex cover having the minimum number of vertices, i.e. we want to “cover” all the edges of the graph using as few vertices as possible.

That would an optimal vertex cover. So, what is meant by “near-optimal”? This means that the vertex cover we find is not exactly one of the minimum possible cardinality. However, it also means that this vertex is close in size, say by a constant multiplicative factor, to the optimal one.

Why should we settle for a solution that is not as good as optimal? The reason is simple: some questions are inherently difficult to solve exactly. Also, for some problems, a near-optimal solution is good enough.

(Note that we only consider graphs that do not have any isolated vertices.)

We digress here a bit to introduce another concept.

Consider a graph, G. Pick an edge of G. Now, pick another edge of G, that is not adjacent to the edge picked earlier (i.e. the two edges do not have any common end-points). These two edges are said to be disjoint.

A matching in a graph is a set of disjoint edges.

What we would like to see is how many edges we can possibly have in a matching.  A maximum matching is a matching having the maximum number of edges that any matching of that particular graph can possibly have. (Note that we say “a” maximum matching, instead of “the” maximum  matching, since a graph can have multiple maximum matchings.)

The vertex cover algorithm is based on two simple ideas.

Idea 1If we take the end-vertices of all edges forming a maximum matching of a graph, G, then this set of vertices, say S, forms a vertex cover of G.

This is simple to prove. Assume there is an edge {x,y} which is not “covered” by S, i.e. neither x nor y is in S. In that case, we can add edge {x,y} to get a new matching. This matching has a size greater than that of our maximum matching, which is not possible. Hence, all edges of G are “covered” by S.

As stated earlier, we are interested in finding a vertex cover having minimum number of vertices. How does set S fare on this account?

Idea 2: The size of a maximum matching of G is a lower bound on the size of a vertex cover of G.

i.e. The best possible vertex cover of G (one with least possible number of vertices) will have at least ‘k’ vertices, where ‘k’ is the number of edges in a maximum matching of G. ( k = |S| / 2, in our  terminology.)

This is also easily proved.  Let’s call the vertex cover as T. Now, every edge of the maximum matching, call it M, must be “covered” by T. So, every edge of M must have at least one end-vertex in T. Also, there is no overlap whatsoever among the end-vertices of any two edges of M. Therefore, T must contain at least |M| vertices.

We can’t obtain a vertex cover with lesser number of vertices.

“Idea 1″ showed how to obtain a vertex cover of size, |S| = 2 |M|.  And “Idea 2″  showed that a minimum vertex cover needs to have at least |M| vertices.

So, what we obtained through “Idea 1″ is a vertex cover that is within a factor of two from the optimal. i.e. The number of vertices it contains is no more than twice the number of vertices contained in an optimal vertex cover.

Such an algorithm is known as a factor-2 approximation algorithm.

Notes:

1.  There exist algorithms that can find a maximum matching in polynomial time. (In algorithmic terms, a polynomial running time is considered to be “efficient”.) On the other hand, we do not know of any such efficient algorithm for obtaining an optimal vertex cover. Hence, we use the matching problem to develop an efficient approximation algorithm.

2. Suppose a network of roads in a town is modeled as a graph, with the streets representing edges, and intersections between streets being the vertices. We are asked to place policemen at the street intersections such that all streets are monitored by at least one policeman. ( A policeman can monitor all streets that begin/end at his intersection.) We want to use as few policemen as possible. This is the same as the optimal vertex cover problem.

Does Kruskal produce all MSTs?

The area of Theoretical Computer Science has so many beautiful questions. Among these are a few questions, that can be called as “cute” questions. The problem raised here is one such question. :)

Here’s the question: Given a weighted connected graph, G, does Kruskal’s algorithm produce all possible Minimum Spanning Trees of G?

A couple of lines on Kruskal’s algorithm: In order to find the MST, you take the edges and order them in non-decreasing order. Then traverse this sorted sequence of edges, picking edges one by one, keeping in mind that the picked edge should not create a cycle with any of the previously picked edges.

There’s also another interesting fact about MSTs: If a graph has unique edge weights, i.e. no two edge weights are same, then that graph also has a unique MST. Also, if a graph has repetitions of certain edge weights, then there may exist more than one MST for the graph.

Now, back to the original question.

Let us take an example graph to understand the arguments that follow.

Suppose our connected graph, G has the following ordered sequence of edge weights:

1 (e1), 2 (e2), 3 (e3), 4 (e4), 5 (e5), 5 (e6), 5 (e7), 6 (e8), 7 (e9), 8 (e10), 8 (e11), 8 (e12), 8 (e13), 9 (e14), 10 (e15), …. , (em).

(The parenthesized symbols are the edge names)

Now, we apply Kruskal’s algorithm on G to get some MST. Let’s call this as Tk.

Also, let T be any random MST of G. We shall compare T with Tk to see whether T can be produced by Kruskal’s algorithm.

Simple till now, right?  :)

Now, T and Tk will generally have some common edges. (Even if they don’t the ensuing arguments will hold just as well.)

Let us consider the edges one by one (in sorted order). Let’s say that edge, e1 is part of both T and Tk. Similarly, edges, e2 and e3 are both of both T and Tk. Also, let us assume that edge, e4 is not part of either T or Tk. So far so good; all edges thus far between T and Tk are common.

Case 1: Now, let us assume that Tk contains edge, e5, but not e6; while T contains edge, e6, but not e5.

We now have to prove that Kruskal’s algorithm can produce an MST which contains e6 but not e5.

(We know that Kruskal’s algorithm selects an edge to be a part of the MST in sorted order. When two or more edges have equal weight, it randomly picks one of them and checks whether that edge can be a part of the MST. Only then does it go to the next edge of equal weight to test whether that is compatible with the tree constructed thus far.)

So, in our example, after Kruskal’s algorithm successfully processes edge, e4, it has to make a choice between e5 and e6. Let us suppose that it picks edge, e6. We know that e6 would be compatible with the previously made selections of e1 through e4 (this is because, T contains the same selection as Kruskal for e1 through e4, and also contains e6. Hence, e6 is compatible with the earlier selections i.e. it does not form a cycle with those edges).

So, one problem is sorted out. There does exist a Kruskal-produced MST, call it Tk1, having edge, e6. But, we need to ensure that this MST does not contain e5. This can be easily proved.

Here’s the proof: We know that  { e1, e2, e3 } is compatible with both e5 as well as e6. (e4 is part of neither T nor Tk, in our example). Now, Kruskal’s algorithm always tests an edge along the ordered sequence in order to accept or reject it. Since, e6 was not part of Tk, it means that { e1, e2, e3, e5, e6} form a cycle. Coming back to Tk1: e6 can’t be part of Tk1 since it forms a cycle with the previously picked edges {e1, e2, e3, e5}.

So, we have proved that if a Kruskal-produced MST and a random MST differ in an edge of the same weight, then we can apply Kruskal’s algorithm to rectify this difference.

Case 2: Let’s get back to the sample edge sequence (reproduced here for our convenience):

1 (e1), 2 (e2), 3 (e3), 4 (e4), 5 (e5), 5 (e6), 5 (e7), 6 (e8), 7 (e9), 8 (e10), 8 (e11), 8 (e12), 8 (e13), 9 (e14), 10 (e15), …. , (em).

Again, let Tk be a Kruskal-produced MST, and T be a random MST of G.

Let’s assume that T and Tk agree on edge selections until edge, e7. Also, let us assume that Tk does not contain e8, while T contains e8.

We can easily prove that this is not possible. The two facts that T agrees with Tk on edges e1 through e7, and that e8 is part of T, imply that e8 is compatible with Kruskal’s selection edges from e1 through e7. By definition, Kruskal’s algorithm must accept e8. Hence, the fact that e8 is not part of Tk is not possible. Hence, case 2 is an impossibility.

Case 3: Consider the same edge sequence:

1 (e1), 2 (e2), 3 (e3), 4 (e4), 5 (e5), 5 (e6), 5 (e7), 6 (e8), 7 (e9), 8 (e10), 8 (e11), 8 (e12), 8 (e13), 9 (e14), 10 (e15), …. , (em).

T and Tk notations are also inherited from above.

Assume that T and Tk agree on edge selections from e1 through e9. Also, assume that T contains two edges of weight, 8, say e10 and e11. On the other hand, assume that Tk contains only one edge of weight, 8, say e12.

To rectify the fact that Tk contains e12 but not e10, we apply the same arguments as in Case 1. Hence, we get that there can be a Kruskal-produced MST, call it TK1, that contains e10 but not e12.

Now, can e11 be made to be a part of a Kruskal-produced MST that contains e10 but not e12? The arguments here are similar in nature to those in Case 2. We can prove that such a case can not exist.

Case 4: Here’s the same sequence:

1 (e1), 2 (e2), 3 (e3), 4 (e4), 5 (e5), 5 (e6), 5 (e7), 6 (e8), 7 (e9), 8 (e10), 8 (e11), 8 (e12), 8 (e13), 9 (e14), 10 (e15), …. , (em).

Using the same meaning for T and Tk, we shall assume that T and Tk agree on edge selections from e1 through e8. Also, assume that Tk contains e9, while T does not contain e9.

Add edge, e9 to T. We get a cycle. It can be observed that this cycle will contain an edge with index greater than 9. i.e. one edge from among the set {e10, e11, …, em}. (This is  because, if e9 formed a cycle composed exclusively of edges selected by Kruskal from e1 through e8, it would not have been a part of Tk in the first place. Hence, the presence of a higher-indexed edge is essential in the cycle).

Now, we can remove the higher-indexed, and hence, higher-weighted edge, say  f, from the cycle and replace it in the MST, T by e9. This new tree, T – {f} + {e9}, is lower in weight than T. This is not possible. Hence, Case 5 is also an impossibility.

Case 5: Here’s the same edge sequence:

1 (e1), 2 (e2), 3 (e3), 4 (e4), 5 (e5), 5 (e6), 5 (e7), 6 (e8), 7 (e9), 8 (e10), 8 (e11), 8 (e12), 8 (e13), 9 (e14), 10 (e15), …. , (em).

Assume T and Tk agree on edge selections from e1 through e9. There can be a possibility that Tk contains 2 or more edges of weight 8, while T contains just one of them. We can show by arguments similar to those in Case  4 that such a possibility does not exist.

The cases that were considered were chosen so as to represent all possible types of interesting situations that might arise. There may be other cases possible, but hopefully it would reduce to one of these.

Explanation of this problem through an example makes, I believe, it easier for the reader and certainly, the writer. :) It is indeed a very long explanation, and probably one may want to think of the answer themselves.

So, Kruskal’s Algorithm does produce all possible MSTs of a graph G.

Some Practical Implications of this Result:

Whenever one is asked to prove that a given spanning tree is not an MST, we just need to show that the given tree can not be produced by Kruskal’s algorithm.

We had seen an interesting fact at the beginning of this post, that uniqueness of edge weights implies uniqueness of MST. (Note that we did not use this fact anywhere in showing that Kruskal’s algorithm produces all MSTs.) This is easily proven since Kruskal’s algorithm produces only one MST when run on a graph with unique edge weights, implying that only one MST exists for such a graph.

Jokes: Methods of proof

How to prove it

(from http://www.workjoke.com/mathematicians-jokes.html )

proof by example:
The author gives only the case n = 2 and suggests that it contains most of the ideas of the general proof.
proof by intimidation:
“Trivial.”
proof by vigorous handwaving:
Works well in a classroom or seminar setting.
proof by cumbersome notation:
Best done with access to at least four alphabets and special symbols.
proof by exhaustion:
An issue or two of a journal devoted to your proof is useful.
proof by omission:
“The reader may easily supply the details”
“The other 253 cases are analogous”
“…”
proof by obfuscation:
A long plotless sequence of true and/or meaningless syntactically related statements.
proof by wishful citation:
The author cites the negation, converse, or generalization of a theorem from the literature to support his claims.
proof by funding:
How could three different government agencies be wrong?
proof by eminent authority:
“I saw Karp in the elevator and he said it was probably NP-complete.”
proof by personal communication:
“Eight-dimensional colored cycle stripping is NP-complete [Karp, personal communication].”
proof by reduction to the wrong problem:
“To see that infinite-dimensional colored cycle stripping is decidable, we reduce it to the halting problem.”
proof by reference to inaccessible literature:
The author cites a simple corollary of a theorem to be found in a privately circulated memoir of the Slovenian Philological Society, 1883.
proof by importance:
A large body of useful consequences all follow from the proposition in question.
proof by accumulated evidence:
Long and diligent search has not revealed a counterexample.

Probabilistic Method

This is a method by which we can prove that a combinatorial object exists with non-zero probability. The interesting thing is that we need not even find any combinatorial object in order to prove its existence.

We’ll consider 2 problems here: the max-SAT and the max-cut problem.

(1) Max-cut: Find a cut in the graph with the maximum number of edges crossing the cut.

This problem is NP-hard. But we can accomplish another thing easily. We can prove that there exists a cut with at least m/2 edges crossing it. (m and n are the edges and vertices in the graph, respectively).

How do we prove that? Simple. Use the probabilistic method.

Assign each vertex of the graph, G independently and equiprobably to one of two vertex subsets. Now, consider any edge (u,v) belonging to G. What is the probability that its ends are in different subsets? Half. (Why? Because, there are 4 cases,(1) u is in subset, A and v is in subset, B; (2) u is in subset, A, and v is in subset, A; (3) u is in subset, B, and v is in subset, A; (4) u is in subset, B, and v is in subset, B. Each of these 4 events has probability 1/4. )

So now, what is the expected number of edges of G  crossing the cut? Answer: m/2 (by linearity of expectation.)

We know one simple fact: If the expected number of edges is m/2, then there exist cuts having at least m/2 edges crossing the cut. (Same as saying: if class average is 25, then  someone got at least 25).

Using the above fact, what have we just proved? There exists a cut having at least m/2 edges.

(2) Max-SAT: We have a set of m clauses, each in conjunctive normal form. What is the maximum number of clauses that can be satisfied?

We wont solve that. Instead, we’ll prove that there exists an assignment of truth values such that at least m/2 clauses are satisfied.

Proof: There are n variables in all, and m clauses. Assign truth values independently and equiprobably to each variable. Consider any one clause which has k variables. What is the probability that this clause is not satisfied? Answer: At most 2^(-k).

So, the probability that the clause is satisfied is at least 1 – 2^(-k), which is greater than or equal to 1/2.

Again, by linearity of expectation, the expected number of satisfied clauses is at least m/2.

Using the probabilistic method, we’ve proved the existence of two combinatorial objects. For more on this, read Ch. 5 of Randomized Algorithms by Motwani and Raghavan.

Shortest Paths

A path, as the name suggests connects 2 vertices in a graph. A shortest path is a path of minimum length.

If we have a particular vertex, say s, and need to find shortest paths from s to all other vertices of the graph. It is called the Single Source Shortest Path Problem (SSSP).

We may also want to find shortest paths between every possible pair of vertices in the graph. This is the All Pairs Shortest Paths Problem (APSP).

How do we find a shortest path between 2 vertices, say s and v?

Simple. Proceed in a manner that would remind us of Prim’s algorithm, which was discussed in the previous post.

We find the shortest path by sort of growing a cloud. Initially, we got only s. Now, choose that vertex which is closest to s. So now the cloud has 2 vertices, and the “diameter” of the cloud is the distance of this newly-added vertex, say x,  from s.

Next step. Now, find a vertex that is father from s than x, but is closer to s than any of the other vertices (or at least no farther than any of them). Call this vertex, y. Now, the path from s to y, will either be direct (i.e. one edge) or would be the addition of edge (x,y) to the path from s to x.

We continue in the same manner. Lets see one more step.

Suppose that the most recent vertex added to the cloud is d, and that the shortest path from s to d is s-> a -> b -> c -> d. How do we choose the next vertex to be added to the cloud?

Select the vertex that is farther from s than d (or at least no closer) and is closer to s than any other vertex outside the cloud (or at least no farther).  This vertex, say e, will have a shortest path from s, that is either a direct path (s->e), or an indirect one (s -> a -> e, or s-> a ->b ->e, or s-> a -> b ->c ->e, or s-> a -> b -> c-> d-> e).

Keep on adding vertices to the cloud until vertex v, also enters the cloud.

For the SSSP problem, keep adding vertices until the all vertices have been added.

This is known as Dijkstra’s algorithm.

It’s quite simple and also very well-known. Incidentally, it was discovered 50 years ago (1959). :)

Minimum Spanning Tree

A Minimum Spanning Tree is a tree that includes all the vertices of the graph. (Tree is a graph that is connected and has no cycles.)

So how do we find a MST of a graph? Standard textbook approach would be to use a greedy algorithm.

(Greedy algorithm: An algorithm where thinking greedily helps you to find the solution :) i.e. local optimal solutions lead to a global optimal solution.)

We’ll see three ways to find the MST.All are greedy.

1. Take the entire graph and start deleting edges one by one. Now, we have to be greedy. So, select the heaviest edge that is left in the graph, and delete it. Now, we also have to ensure that the resultant graph spans the graph. So, do not delete any edge that causes the graph to become disconnected; instead check for the next lightest edge to delete.

So, keep deleting edges starting from the heaviest to the lightest keeping in mind the above rules.

Till when do we continue this deletion??  Well, do it till you can delete no more edges without disconnecting the graph. Thats it. We have a MST.

2. Initially MST is empty. Add edges to this to get the MST. How to add edges? Again, greedily. Take the lightest edge. Add it to the MST. Now, we also need to ensure that the resultant MST remains a tree, i.e. there should be no cycles. So, while adding edges, do not add edges that cause the MST to have a cycle. Instead, go to the next heavier edge to add.

So, keep adding edges successively starting from the lightest to the heaviest, keeping in mind the above rules.

When do you stop? Simple; when addition of any more edge to the MST creates a cycle.

3. Initially the MST is empty. And again we’ll add edges greedily. Add the lightest edge to the MST. You get a connected component of 2 vertices, right? Now add the lightest edge that is incident upon this connected component. Ensure that the added edge does not create a cycle in the MST. In case the edge creates a cycle, skip adding it and proceed to the next heavier edge incident upon the connected component.

Keep adding till no more edges can be added without creating a cycle. There’s the MST.

The 2nd algorithm is called Kruskal’s algorithm, and the 3rd is called Prim’s algorithm.

Funny Math

Well here are some maths jokes. These are taken from http://www.math.utah.edu/~cherk/mathjokes.html and from http://www.math.ualberta.ca/~runde/jokes.html. These sites have got many more of these.

An infinite crowd of mathematicians enters a bar.
The first one orders a pint, the second one a half pint, the third one a quarter pint…
“I understand”, says the bartender – and pours two pints.

———————————————————————————————————-

Q: When did Bourbaki stop writing books?
A: When they realized that Serge Lang was a single person…

———————————————————————————————————-

Classification of mathematical problems as linear and nonlinear is like classification of the Universe as bananas and non-bananas.

———————————————————————————————————-

A tragedy of mathematics is a beautiful conjecture ruined by an ugly fact.

———————————————————————————————————-

The difference between an introvert and extrovert mathematician is: An introvert mathematician looks at his shoes while talking to you. An extrovert mathematician looks at your shoes.

———————————————————————————————————-

An engineer, a physicist and a mathematician are staying in a hotel.
The engineer wakes up and smells smoke. He goes out into the hallway and sees a fire, so he fills a trash can from his room with water and douses the fire. He goes back to bed.

Later, the physicist wakes up and smells smoke. He opens his door and sees a fire in the hallway. He walks down the hall to a fire hose and after calculating the flame velocity, distance, water pressure, trajectory, etc. extinguishes the fire with the minimum amount of water and energy needed.

Later, the mathematician wakes up and smells smoke. He goes to the hall, sees the fire and then the fire hose. He thinks for a moment and then exclaims, “Ah, a solution exists!” and then goes back to bed.

———————————————————————————————————-

A math student is pestered by a classmate who wants to copy his homework assignment. The student hesitates, not only because he thinks it’s wrong, but also because he doesn’t want to be sanctioned for aiding and abetting.
His classmate calms him down: “Nobody will be able to trace my homework to you: I’ll be changing the names of all the constants and variables:
a to b, x to y, and so on.”
Not quite convinced, but eager to be left alone, the student hands his completed assignment to the classmate for copying.
After the deadline, the student asks: “Did you
really change the names of all the variables?”
“Sure!” the classmate replies. “When you called a function
f, I called it g; when you called a variable x, I renamed it to y; and when you were writing about the log of x+1, I called it the timber of x+1…”

———————————————————————————————————-

Q: How does one insult a mathematician?
A: You say: “Your brain is smaller than any >0!”

———————————————————————————————————-

There are 10 kinds of mathematicians. Those who can think binarily and those who can’t…

———————————————————————————————————-

Q: What is the difference between a Ph.D. in mathematics and a large pizza?
A: A large pizza can feed a family of four…

———————————————————————————————————-

A chemist, a physicist, and a mathematician are stranded on an island when a can of food rolls ashore. The chemist and the physicist comes up with many ingenious ways to open the can. Then suddenly the mathematician gets a bright idea: “Assume we have a can opener …”

———————————————————————————————————-

The physicist and the engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: “Helllloooooo! Where are we?”
15 minutes later, they hear an echoing voice: “Helllloooooo! You’re in a hot-air balloon!!”
The physicist says, “That must have been a mathematician.”
The engineer asks, “Why do you say that?”
The physicist replied: “The answer was absolutely correct, and it was utterly useless.”


Note on the Formula for PI

In the previous post, we saw that PI = Lim (theta->0) 360 *  sin(theta/2) / theta. (Here, theta is in degrees.)

Now, we all know that standard result: : Lim (x -> 0) sin x / x = 1.  (Of course, x is in radians here.)

Important note: We’ll use Boldface for values in radians, and normal for degrees.

At first glance it may seem that Lim (theta->0) 360 *  sin(theta/2) / theta = 1/2 * 360 = 180, and not PI.

So how can we verify that our formula for PI is actually correct?

The reason is rather simple. It is because: Lim (x->0) sin x / sin x = PI / 180. ( Numerator: x in degree; Denominator: x in radians).

Proof:

Lim (theta->0)  sin theta/ sin theta ( Numerator: theta in degree; Denominator: theta in radians).

=  Lim (theta->0)   sin ( PI * theta / 180) / sin theta.    [ Now both numerator and  denominator have value of the angle in radians.]

= Lim (theta->0)   [sin (PI * theta / 180) / (PI * theta / 180)]   *  [ (PI * theta / 180) / sin theta]

Now, Lim (theta->0) [sin (PI * theta / 180) / (PI * theta / 180)] = 1  ( We can use the standard limit formula here since the angle is in radians.)

and,  Lim (theta->0) [ (PI * theta / 180) / sin theta] =  PI / 180.  (Again we can use the formula since angle is in radians.)

therefore,  Lim (theta->0) sin theta / sin theta = PI / 180.

Using this, the formula for PI in the previous post can be verified. :)

Formula for PI

The ancient mathematicians calculated the value of PI to a number of significant digits.  Everyone knows the definition  of PI: ratio of circumference of a circle to it’s diameter.

Now, how did the value 3.1415… come? How might have the first person calculated the numerical value of PI?

Key idea: We need a good approximation to the value of circumference. How do we do that? Take an infinitesimally small arc and approximate its length to that of the corresponding chord.

Consider an arc (AC) of the circle describing angle : theta (in degrees) at the centre of the circle (B)

.    A
.      /
r   /
. /   Angle B=theta
/——————-
B             r       C

Here, r is the radius.

We can complete the triangle by joining AC.

Now length of side AC =   sin(theta) * [ r / sin[(180-theta)/2] ] (Using the sine formula on triangle ABC, and since  angle A = angle C = (180 – theta) / 2])

Now we can use simple results to get :    length of side AC = 2 * r * sin(theta/2).

( since  : sin [(180 - theta)/2] = sin (90 – theta/2) = cos (theta/2)   &  sin(theta) = 2 sin(theta/2) cos(theta/2). )

Key idea (once again): As (theta->0),  length of side AC -> length of arc AC;
and circumference of circle = length of arc AC * (360/theta)

Hence, ratio of circumference to diameter = PI=  Lim (theta ->0) (360/ theta) * (length of arc AC) / (2 * r)

= Lim (theta ->0) (360/theta) * [2 * r * sin(theta/2)]  / [2*r]

= Lim (theta->0) 360 *  sin(theta/2) / theta.
So, there we have it: PI = Lim (theta->0) 360 *  sin(theta/2) / theta.

We can check using a calculator:

For theta =  1:    PI = 360 * sin(0.5) / 1.0  = 3.14155

For theta = 0.002:   PI = 360 * sin(0.001) / 0.002 = 3.14159.

Number patterns

Number theory has amazing patterns. Well, here I’ll give some interesting ones on sums of digits.

Sum of digits (s.o.d) of a natural number is found simply by computing the number modulo 9. (eg. s.o.d 157= 4)

Consider powers of  2. The sums of digits of the powers follow an interesting cycle- 1, 2, 4, 8, 7, 5.

1*2 = 2

2*2=4

4*2=8

8*2=16=7 (mod 9)

16*2=32=5 (mod 9)

32*2=64=1 (mod 9)

And again the same cycle repeats after 64. (s.o.d 128 = 2; s.o.d 256 = 4, and so on.)

Well, intuitively the proof is simple.

It follows from the result that s.o.d ( product of 2 numbers) = s.o.d of 1st number * s.o.d of 2nd number.

i.e. if we have 2 natural numbers, x and y; then s.o.d (x*y) = s.o.d (x) * s.o.d (y).

The result on product of s.o.d of numbers can be broken down further; that is, the above result can be obtained from the fact that s.o.d ( sum of 2 numbers) = s.o.d of 1st number + s.o.d of 2nd number.

From the result on product of sums of digits, the pattern stated above for powers of 2 is easily obtained by induction.

Well, we can state similar pattern for different numbers:

The cycle for the powers of 5 is 1, 5, 7, 8, 4, 2.

Now, we can see an interesting design- the cycle for powers of 5 is “opposite in direction” to the cycle for powers of 2.

Well, this is actually to be expected, and one can easily see intuitively why this is so. I leave it for the reader to figure it out. (Hint: 2^n * 5^n = 10^n = 1 (mod 9) ).

So the cycles for 2 and 5 are of lengths 6 each. The cycle length for powers of 7 is 3 (Cycle is  {1, 7, 4}.)

The cycle for 11 is 1, 2, 4, 8, 7, 5. Hey, this is the same as that for 2. Why? 11= 2 (mod 9). That’s why. If 2 numbers are equal modulo 9, they will have the same cycle, right.

Follow

Get every new post delivered to your Inbox.