Software engineering interview questions – Intersection of two arrays

A few approaches are presented herein for solving the question on computing the intersection of two arrays. What follows is also attached in the pdf format here – algorithm1


Advertisements

Strings in C++ : Find the first non-repeated character in a string – 3

APPROACH 3

We can approach this problem using a lookup table. Using a lookup table, the running time can be significantly reduced.

For simplicity, let us assume that our input string can only contain characters from the English alphabet i.e. only 26 characters are possible, though each may be repeated any number of times.

The idea is to construct a lookup table having space for 26 entries. Each entry in the lookup table corresponds to a character in the input string.

We will proceed as follows— Create a lookup table of size 26. Initialize each entry in the lookup table to be 0. The entries in the lookup table correspond to the 26 characters in the English alphabet, and are equal to the number of times that particular character appears in the input string.

Traverse the input string. Say the string that you are given is “elephant”. The first character is ‘e’. Go to the lookup table entry corresponding  to ‘e’ and increment the count in that entry by 1. But which is the entry in the lookup table corresponding to ‘e’?

Let us call the lookup table L.

L[0] will store the count of ‘a’. L[1] will store the count of ‘b’. … L[25] will store the count of ‘z’.

So, for ‘e’, we go to L[4] and increment that count by 1.

Now, once we have finished traversing the string, and have the counts of all characters available to us in L, we need to output the first non-repeated character.

For this, we proceed as follows:

Traverse the input string again. Take the first character in the input string. Go to the entry corresponding to that character in L. (Note that we can do this in O(1) time.) Check if the count is 1. If so, output it. Else, go to the next character in the input string, and repeat the same procedure.

————

Now, what would be the running time of this procedure?

Initializing the look up table takes O(1) [since the size of the lookup table is a constant.]

Traversing the string to populate the lookup table with appropriate counts takes O(n) time. (n is the length of the input string.)

Traversing the string again, to output the first non-repeated character takes O(n) time.

Hence, the total running time is O(n).

The memory requirement is that of a lookup table of constant size, which is O(1).

———

In C++, we can code it as follows:

——–

#include <iostream.h>

//func f returns the first non-repeated character in s.

char f ( char *s)

{

//check for null string.

if ( ! s)

return ”;

int L [26] = {0}; //lookup table, initially all entries are 0.

//we now traverse the string to populate L.

char *t = s;

while (  *t != ”)

{

L [ *t – ‘a’]++; // e.g. if *t is ‘e’, we need to increment L[4] which is L [‘e’ – ‘a’].

t++;

}

//we have now populated L with counts.

//we now traverse s and using L, output the first non-repeated char.

t = s;

while ( *t != ”)

{

if ( L [ *t – ‘a’ ] == 1)

return *t;

}

//no non-repeated char in s.

return ”; //we may also write: return 0; (using the ASCII value of NUL)

}

———-

Now, what if remove the simplifying assumption made earlier and say  that s can contain any character. In this case, the lookup table needs to have 128 entries (one for each ASCII character).

For the preceding program, we use L [ ch – ‘a’ ] to access the lookup table entry corresponding to ch. In the new lookup table having 128 entries, we can use the ASCII values of the characters as direct indices into the lookup table, i.e. we can use L[ch] to access the count of character ch.

The remaining part of the code would remain more or less the same as above.

———-

Strings in C++ : Find the first non-repeated character in a string – 2

APPROACH 2

The previous approach consider each character of the string and then compared it with each of the remaining characters (albeit only those following it) to check if it gets repeated anywhere. Hence it took O(n^2) time. We can do better than that by sorting the string. Once the string is sorted, checking if any given character in that string gets repeated becomes a simple matter of checking the elements adjacent to it.

What we propose to do is this:

– Copy the original string into a new string.

– Sort this new string. (This will take O(n \log n) time. )

But now we are faced with a problem.

We need to determine a non-repeated character in the original or new string. If we wanted just any non-repeated character, that would have been simple. — We would traverse through the sorted new string, and in O(n) time, we would have been able to output a non-repeated character in that string. Hence, the total running time would have been O(n \log n). However, what we want is  the first non-repeated character.

One idea could be to proceed as follows: Take the first character in the original string. Using  the sorted string, check if that character is being repeated. If not, return that character. Else, go to the next character in the original string and repeat the same process again.

But again, we are faced with a problem — How do we determine the location of the first character of the original string in the new sorted string in constant time? (We can, of course, do so in O(n) time. But that would only lead to an O(n^2) algorithm, which would mean that even after sorting we weren’t able to improve the running time.)

—–

From the above, we observe 2 things —

1. Sorting would help in finding repeated / non-repeated characters in a string.

2. But sorting, as such, would not help us efficiently find the first non-repeated character, because we cannot perform a constant time lookup for any given character in the sorted string.

The solution is somewhat clear given the above observations. Do a sort akin to bucket sort. i.e. keep a bucket for each character that a string can possibly have. Once you have that array of buckets each corresponding to one possible character, you can perform constant time lookup in that array for any given character. i.e. we can construct a lookup table. This leads us to the approach detailed in the next post.

Strings in C++ : Finding the first non-repeated character in a string

We are given a string, and told to find the first non-repeated character in it. Let’s look at an example: let’s say our string is “alligator”. The first non-repeated character in this string is “i”.

Now, how do we go about finding this first non-repeated character programmatically?

APPROACH 1

There is a very direct algorithm which we can employ — Take the first character of the given string (which is ‘a’ in our example string “alligator”). Compare it with each of the other characters in the string. If that character does not appear anywhere else, return that character and terminate the algorithm. Else, if the character does appear somewhere else in the string (as in the case of the ‘a’ in “alligator”), go to the 2nd character of the string, and repeat the same process (i.e. check if the 2nd character of the string gets repeated anywhere). Continue this process for each character of the string (i.e. in the k th iteration, we would compare the k th character of the string with each character in the string to determine if it appears anywhere else). If, at the end of the algorithm, we find that all characters get repeated somewhere, i.e. there is no non-repeating character, print a message to that effect.

———

The above algorithm is simple to understand and implement. It will run in O(n^2) time where n is the length of the input string.

——

In pseudocode, the algorithm would look as follows:

——-

Input: String s of length n.

Output: First non-repeated character in s.

Algorithm:

For i going from 0 upto n-1:

flag = true; // flag is a boolean variable initialized to true.

For each j going from 0 upto n-1:

If ( s[i] == s[j] )

flag = false;

End For. //end the inner For loop.

If ( flag == true ) //i.e. if flag is true when we come out of the inner For loop

return s[i];  //character s[i] is the 1st non-repeated character in s.

End For. // end the outer For loop.

//If the algorithm reaches this statement, it means that there is no non-repeated character in s.

// Hence we return a default character, which would not otherwise be returned.

// We choose the null character — NUL for this purpose.

return NUL;

———

So that was the pseudocode for the algorithm. We can translate the above code into C++ as follows:

———

#include <iostream.h>

// function f takes a string s, and returns the first non-repeated character in s.

// In case there is no non-repeated character, it returns the null character ” (having ASCII value 0)

char f (char *s)

{

//we first check if s is the null string.

if ( !s)

return ”; // return the null character is s is the null string.

int i = 0, j;

bool flag;

while ( s[i] ! = ”)

{

flag = true;

j = 0;

while ( s[j] ! = ”)

{

if ( s[i] == s[j] )

flag = false;

j ++;

}

if (flag)

return s[i];

i++;

}

return ”; //i.e. no non-repeated character was found.

}

————


Dynamic programming – Probability, gambling and a die

Let’s say we have a fair 6-sided die. Player A can earn money based on what shows up when the die is rolled. The rule of the game is as follows: Player A will roll a die upto 3 times. After each roll, Player A has two choices: (1) He can take money in dollars equal to the number showing on the upturned face of the die. (i.e. if 5 shows on the die, he can take $5.)  In this case, the die is not rolled any more, and the game ends. (2) He can decide to discard this roll of the die, and decide to roll the die again. Note that the die can be rolled at most 3 times.

Question — What is the expected payoff for Player A?

——–

Let’s say that instead of 3 rolls of the die, the game allowed only for 1 roll. What is the expected payoff for Player A in this case? It is clearly equal to the expected value of the number that shows up when the die is rolled. And since the die is fair, this expectation is equal to (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. And Player A has no choice but to accept whatever shows up on the die because there is only one roll possible. Hence, in this game, the expected payoff for Player A is $3.5.

Now, consider what happens if instead of 1 roll, we allow for upto 2 rolls.

Now, if the 1st roll is a 6, obviously Player A would pocket $6 and decide to end the game there itself. Also, if the 1st roll is a 1, Player A can always decide to roll again [because irrespective of what shows up on the 2nd roll, it cannot be less than $1, and hence Player A has nothing to lose by deciding to discard the 1st roll and opting to roll again.]

Now consider what happens if the 1st roll is a 3. Now, Player A is faced with a predicament. What should he do? Should he decide to stop there and go home with $3, or should he discard this 3 in the hope of higher returns in the 2nd and final roll?

This is where having learnt probability well in school helps.

We know that the expected value of the number that shows up when the die is rolled once is 3.5.

Hence, if in the 1st roll, Player A gets a number greater than 3.5 i.e. either 4, 5 or 6, he can decide to stop there and take the corresponding amount of money.

However, if he gets either 1, 2 or 3 in his 1st roll, he should discard that, and decide to roll again, because the expected value of the 2nd roll is 3.5.

Hence, the expected payoff for Player A from a game which allows for at most 2 rolls is: \frac{3}{6} 3.5 + \frac {1}{6} 4 + \frac{1}{6} 5 + \frac{1}{6} 6 = 4.25.

(Here, 3/6 is the probability of getting a 1, 2 or 3 on the 1st roll, in which case the expected payoff becomes 3.5. )

——-

Now, coming back to our original game, we can decide as to how we should proceed after the die is rolled once.

We know that the expected payoff from 2 rolls of the die is $4.25.

Hence the strategy of Player A should be the following:

– If the 1st roll of the die yields either a 5 or a 6, take the money, and end the game.

– Else, discard the 1st roll, and continue the game.

The expected payoff can be computed easily based on the preceding observations:

Expected payoff for Player A = \frac{4}{6} 4.25 + \frac{1}{6} 5 + \frac{1}{6} 6 = \frac {14}{3}.

———-

This is a classic question using the principle of dynamic programming. Let’s say the the game allows for the die to be tossed upto n times.

Let E[k] denote the expected payoff for Player A in a game that allows upto n-k rolls of the dice.

The strategy for Player A after roll k should be the following:

— If the number showing on the dice is greater than E[k], take the money and stop the game.

— Else, discard that roll, and decide to continue on to the next roll.

——-

The recurrence relation for computing the E[k] values can be derived in terms of E[k+1] as was done in the case of the 3-roll example that we saw earlier. Once we have done that, we can calculate the values in the order, E[n-1], E[n-2] and so on uptil E[0].

——-

Dynamic Programming – Maximum sum contiguous subsequence

We are given an array A having n integers. What we want to find is a contiguous subseqnence (A[i], A[i+1], …, A[j]) such that the sum of the elements in that subsequence is maximized. (Note that, in general, there may be more than one such optimal subsequence.)

Note that if all elements in A are non-negative, we could directly return the entire array as the required optimal subsequence.

APPROACH 1

A simple brute-force method could be to compute the sums of all possible subseqnences in A. How many subsequences are there? There are n+ (n-1) + (n-2) + ... + 1 = O(n^2) possible subsequences. Computing the sum of a subsequence takes O(n) in the worst-case. Hence, the running time of the algorithm would be O(n^3).

In C++, we could write the following function to do what was explained above:

// start and end are the starting and ending indices of an optimal subsequence.

void f ( int* A, int n, int &start, int& end)

{

int sum, max = A[0];

for (int i = 0; i < n ; i++)

for (int j = i; j < n; j++)

{

sum = 0;

for (int k = i; k <=j; k++)

sum+= A[k];

if (sum >= max)

{

start = i;

end = j;

max = sum;

}

}

}

————

APPROACH 2:

We can improve the running time to O(n^2) by being a bit more clever in computing the sums of different subsequences. We observe that the sum of the subsequence A[i … j+1] = A[j+1] + sum of the subsequence A[i … j].

In C++, we could write as follows:

void f (int *A, int n, int &start, int &end)

{

int sum, max = A[0];

for (int i = 0; i < n; i++)

{

sum = 0;

for (int j = i; j < n; j++)

{

sum + = A[j];

if (sum >= max)

{

start = i;

end = j;

max = sum;

}

}

}

}

———–

APPROACH 3:

Using dynamic programming, we can solve the problem in linear time.

We consider a linear number of subproblems, each of which can be solved using previously solved subproblems in constant time, this giving a running time of O(n).

Let S[k] denote the sum of a maximum sum contiguous subsequence ending exactly at index k.

Thus, we have that:

S[k+1] = max \{S[k] + A[k+1], A[k+1] \} (for all 1 \leq k \leq n-1)

Also, S[0] = A[0].

——–

Using the above recurrence relation, we can compute the sum of the optimal subsequence for array A, which would just be the maximum over S[i] for 0 \leq i \leq n-1.

Since we are required to output the starting and ending indices of an optimal subsequence, we would use another array T where T[i] would store the starting index for a maximum sum contiguous subsequence ending at index i.

In prseducode form, the algorithm would look thus:

Create arrays S and T each of size n.

S[0] = A[0];

T[0] = 0;

max = S[0];

max_start = 0, max_end = 0;

For i going from 1 to n-1:

// We know that S[i] = max { S[i-1] + A[i], A[i] .

If ( S[i-1] > 0)

S[i] = S[i-1] + A[i];

T[i] = T[i-1];

Else

S[i] = A[i];

T[i] = i;

If ( S[i] > max)

max_start = T[i];

max_end = i;

max = S[i];

EndFor.

Output max_start and max_end.

———–

The above algorithm takes O(n) time and O(n) space.

——–

We can however improve upon the space requirements, reducing it to O(1). The key idea is this: for computing S[i] and T[i], all we need are the values of S[i-1] and T[i-1] apart from the given array A. Hence, there is no need to store all n values of the S and T arrays.

We could proceed as follows:

max = A[0];

max_start = 0;

max_end = 0;

S = A[0];

T = 0;

For i going from 1 to n-1:

// S = max { S + A[i], A[i] )

if ( S > 0)

S = S + A[i];

Else

S = A[i];

T = i;

If ( S > max)

max_start = T;

max_end = i;

max = S;

End For.

Output max_end and max_start.

———–

The above algorithm takes O(n) time and O(1) space.

———-

Greedy algorithms – Coin changing using minimum number of coins

We have coins of denominations 1 cent, 5 cents and 25 cents with us. What we would like to do is represent a total of n cents using these coins, such that a minimum number of coins are used.

For example, if we want to represent 31 cents, we could do that by using 1 25-cents coin, 1 5-cents coin and a single coin of 1-cent, thereby utilizing a total of 3 coins. And this indeed is the bast we can do. Why is that so? Let’s look at it this way — the current solution uses one 25-cent coin, one 5-cent coin and one 1-cent coin. One way in which we can try to modify this solution would be to decide to not use any 25-cent coin at all (instead of the one coin that we are currently using). That would mean that we would have to represent the shortage of 25 cents that we are now faced with, using coins of denominations 1 and 5-cents. We can see that we would need at least five coins of denomination 5-cents in order to represent 25-cents. Hence, using a 25-cent coin was actually a wise choice enabling us to save on the number of coins used.

Now, what if we had to represent 57 cents. The best solution would be to use two coins of 25-cents, one coin of 5-cents and two coins of 1-cent.

Again, we can intuitively infer its optimality by observing that:

1. if we use lesser number of 25-cents coins than what we are currently using, we would actually end up using more coins. —(If we use just one 25-cent coin, then we would require 5 coins of 5-cents each to cover up the deficit of 25-cents. Similarly if we do not use any 25-cents coins at all, we would need 10 coins of 5-cents each to cover the deficit.)  Hence, the idea is to use as many 25-cent coins as we possibly can.

2. Similarly, once we have used the maximum number of 25-cent coins, we are left with k cents (where k < 25) to represent using 1 and 5-cent coins. Analogous to what we observed above, we can infer that we must use as many 5-cent coins as we can in order to cover the deficit of k cents. Whatever is left can be represented using 1-cent coins.

———

Hence, the algorithm would look something like this.

//Represent n cents using the least number of total coins where coins are of denominations – 1, 5 and 25-cents.

1. Divide n by 25 to get quotient q1 and remainder k1.

2. Use q1 coins of 25-cents each.

3. If k1 == 0, we are done. Else, divide k1 by 5 to get quotient q2 and remainder k2.

4. Use q2 coins of 5-cents, and k2 coins of 1-cent.

———-

Now, if we have coins at our disposal having denominations 1, k, k^2, k^3-cents and we had to represent n cents with the minimum number of coins, we could do the following:-

———–

i = 3;

num = n;

While ( exponent >= 0)

{

Divide num by k^i to get quotient q and remainder r.

Use q coins of denomination k^i cents.

If ( r == 0)

break;

num = r;

i – – ;

}

———

Now, take the following example, represent 31 cents using coins of denominations 1, 10 and 25 cents such that the least number of coins are used.

Here, the idea behind the greedy algorithm of using the maximum possible number of coins of the highest denomination would not work. That approach would get us a solution that uses 6 coins : one 25-cent coin, and 6 1-cent coins.

In contrast, we can get a better solution using 4 coins: 3 coins of 10-cents each and 1 coin of 1-cent.

In the problems presented at the beginning of this post, the greedy approach was applicable since for each denomination, the denomination just smaller than it was a perfect divisor of it. (i.e. 25 was a factor of 125; 5 was a factor of 25; 1 was a factor of 5}. However, that not being true in this case {10 is not a factor of 25} leads to the failure of the greedy approach.

%d bloggers like this: