Strings in C++ : Find the first non-repeated character in a string – 2

APPROACH 2

The previous approach consider each character of the string and then compared it with each of the remaining characters (albeit only those following it) to check if it gets repeated anywhere. Hence it took O(n^2) time. We can do better than that by sorting the string. Once the string is sorted, checking if any given character in that string gets repeated becomes a simple matter of checking the elements adjacent to it.

What we propose to do is this:

– Copy the original string into a new string.

– Sort this new string. (This will take O(n \log n) time. )

But now we are faced with a problem.

We need to determine a non-repeated character in the original or new string. If we wanted just any non-repeated character, that would have been simple. — We would traverse through the sorted new string, and in O(n) time, we would have been able to output a non-repeated character in that string. Hence, the total running time would have been O(n \log n). However, what we want is  the first non-repeated character.

One idea could be to proceed as follows: Take the first character in the original string. Using  the sorted string, check if that character is being repeated. If not, return that character. Else, go to the next character in the original string and repeat the same process again.

But again, we are faced with a problem — How do we determine the location of the first character of the original string in the new sorted string in constant time? (We can, of course, do so in O(n) time. But that would only lead to an O(n^2) algorithm, which would mean that even after sorting we weren’t able to improve the running time.)

—–

From the above, we observe 2 things —

1. Sorting would help in finding repeated / non-repeated characters in a string.

2. But sorting, as such, would not help us efficiently find the first non-repeated character, because we cannot perform a constant time lookup for any given character in the sorted string.

The solution is somewhat clear given the above observations. Do a sort akin to bucket sort. i.e. keep a bucket for each character that a string can possibly have. Once you have that array of buckets each corresponding to one possible character, you can perform constant time lookup in that array for any given character. i.e. we can construct a lookup table. This leads us to the approach detailed in the next post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: