Dynamic Programming – 2 : Longest Common Subsequence
February 3, 2011 Leave a comment
Longest Common Subsequence (LCS)
We are given 2 sequences and . Our task is to find the LCS of X and Y.
Terminology: A subsequence of X is a set of elements that can be obtained from the elements of X such that no 2 two elements of the subsequence are out of order wrt the ordering defined by X. e.g. if X = < E L E P H A N T >, then < E P H T>, < E E A > etc are subsequences of X.
A common subsequence of X and Y is a sequence Z that is a subsequence of both X and Y. The longest such Z is an LCS.
Let OPT (m, n) denote the length of an LCS of X and Y. (Note that there may be more than one LCS’s, each of them having the same optimum length.)
Either of these 2 cases is true : either or .
If then we can certainly include that element in the LCS and proceed to finding the LCS of and .
Notation: Let denote . is defined analogously.
Notation: Let denote the length of the LCS of and .
Therefore, we have that:
If , then we have that:
Generalizing the above, we have that, for and :
We can use the above recurrence to compute the optimum length in a bottom up fashion.
Pseudocode for computing the optimum length:
Initialize OPT (0, i) and OPT (j, 0) to be 0 for all 1 <= i <= n, and 1 <= j <= m.
For i from 1 to m:
For j from 1 to n:
Compute OPT (i,j) and OPT (j,i) using the above recurrence.
End of algorithm.
What we want is the length of OPT (m,n). The above algorithm computes OPT(m,n) in O(mn) time.
Using the table of OPT (i,j) values, we can compute the LCS recursively.
Function: OPT_Solution (i,j) :=
If (i == 0) OR ( j == o)
// Exit; Break recursion.
OPT_Solution (i-1, j-1);
OPT_Solution (i-1, j);
OPT_Solution (i, j-1);
End of algorithm.
The above code for printing the LCS runs in O(m+n) time.