Dynamic Programming – 6 : Optimum sum of tree nodes

Optimum sum of tree nodes

We are given a rooted tree $T$ with root $r$. Each node $x$ has a real value $v(x)$ associated with it. We are to find a subset of nodes of the tree, such the sum of the values of the selected nodes is maximized, subject to the condition that if we choose a node $x$, we cannot choose any of its children or its parent.

Solution.

Notation. Let $OPT (x)$ denote the optimum value on a sub-tree rooted at node $x$. Let $OPT (1,x)$ denote the optimum value on a sub-tree rooted at node $x$ subject to the condition that $x$ is selected as part of the optimal subset. Let $OPT(0,x)$ denote the optimum value on a sub-tree rooted at $x$ subject to the condition that $x$ is not selected as part of the optimal subset.

Then, we have that:

$OPT(x) = max \{OPT(1,x), OPT(0,x)\}$.

$OPT(1,x) = v(x) + \sum_y OPT (0,y)$ (where the summation is over all children $y$ of $x$.)

$OPT(0,x) = \sum_y OPT (y)$ (where the summation is over all children $y$ of $x$.)

Also, if $x$ is a leaf, then $OPT(1,x) = v(x)$, and $OPT(0,x) = 0$.

Pseudocode for computing OPT(root)

We can compute OPT(root) in a bottom up fashion starting from the leaves. The recurrence equations for a node $x$ involve the OPT values for its children. Thus, before computing the $OPT$ value for node $x$, we compute the $OPT$ values for its children.

Let the nodes of the tree be arranged in levels from Level 0 till Level L such that the root is at Level 0.

For i going from L downto 0:

For all nodes x in Level i:

Compute OPT[1,x], OPT[0,x] and OPT[x] using the above recurrence.

Set Soln[x] = 1 if OPT[1,x] >= OPT [0,x]. Else, set Soln[x] = 0.

End.

—-

The above code has a running time of $O(n)$ where $n$ represents the number of nodes in the tree. [ We compute OPT values for n nodes. For each node, we compute the OPT values by summing up the OPT values over all of its children. Aggregating over all nodes, and observing the each node can have at most one parent, we find that the summing up operation costs O(n). Therefore, the total cost is O(n+n) which is O(n). Another way to look at it is that the aggregate cost of the summing up operations over all nodes is O(m) [where m represents the number of edges in the tree], which for a tree is O(n). ]

—-

Computing the optimal subset:

We do so using the Soln[x] values.

Function: Print_Solution (x)

If (Soln[x] == 1)

Print x.

For all grand-children y of x:

Print_Solution (y);

Else

For all children y of x:

Print_Solution (y);

End.

—-

Print_Solution(root) has a running time of $O(n)$.

—-

Dynamic Programming – 5 : Matrix chain multiplication

Matrix chain mnultiplication

We are given a sequence of $n$ matrices $$, such that matrix $M_i$ has dimensions $a_i \times a_{i+1}$. Our task is to compute an optimal parenthesization of this sequence of matrices such that the sequence is fully parenthesized.

Terminology: A sequence of matrices is fully parenthesized if the sequence is either a single matrix or a product of 2 fully parenthesized sequences.

Optimality here refers to minimization of the number of scalar multiplications involved in computing the product of the matrix sequence.

Note 1:

// C code for multiplying A (m x n) with B (n x p). Product is C (m x p).

for (i=0; i < m; i++)

for (j=0; j < p; j++)

{

C[i][j] = 0;

for (k=0; k<n; k++)

C[i][j] + = A[i][k] * B[k][j];

}

//End of code.

Note 2: From the above, we can see that multiplying an (m x n) matrix with a (n x p) matrix requires m x n x p scalar multiplications.

Note 3: Matrix multiplication is associative. So, we can parenthesize the given sequence of $n$ matrices whichever way we want as long as we do not disturb the sequence itself.

—-

Solution.

If n > 1, we know that the required parenthesization divides the $n$ matrix sequence into 2 smaller fully parenthesized sequences.

Let this division occur at matrix $M_k$, i.e. the number of scalar multiplications in $$ equals the sum of the number of scalar multiplications in $$ and $$ plus the value $a_1 \times a_{k+1} \times a_n$.

Notation: Let $OPT(i,j)$ denote the number of scalar multiplications in the optimal parenthesization of the sequence $$.

From the above, we have that:

$OPT(1,n) = \min_{1 \leq k \leq n-1}\{ OPT (1, k) + OPT(k+1, n) + a_1 \times a_{k+1} \times a_{n+1} \}$.

Generalizing this, we can say that, for $1 \leq i < j \leq n$:

$OPT(i, j) = \min_{i \leq k \leq j-1}\{ OPT (i, k) + OPT(k+1, j) + a_i \times a_{k+1} \times a_{j+1} \}$

Further, $OPT(i,i) = 0$ for all possible $i$.

—-

Pseudocode for computing optimum number of scalar multiplications:

For i from 1 to n:

For j from 1 to (n-i):

Compute OPT [i,j] according to the above recurrence.

Set Soln[i,j] = k; // k is the index that optimizes OPT[i,j] in the recurrence.

End of algorithm.

The above code will take $O(n^3)$ to find $OPT[i,n]$ (in the process, computing the OPT[i,j] and Soln[i,j] table values).

Computing the optimal solution i.e. parenthesization

We can do so using the Soln [i,j] table.

Function: Print_Solution (i, j)

If ( i == j)

Print $A_i$

Else

Print (

Print_Solution (i, Soln[i,j] );

Print_Solution ( Soln[i,j] + 1, j);

Print )

End.

—-

This function takes $O(n)$ time for the call Print_Solution (1, n).

Dynamic Programming – 4 : Sequence Alignment

Sequence Alignment

We are given 2 sequences $X = $ and $Y = < y_1, \ldots, y_n>$. Our task is to produce a valid matching of the elements of $X$ and $Y$ such that the cost of the matching is minimized.

Terminology:

A valid matching denotes a one-to-one correspondence of (some subset of) elements of $X$ and $Y$ with the added condition that there should be no crossing in the matching.i.e. if $x_i$ is matched to $y_j$, and $x_p$ is matched to $y_q$, then $i < p$ implies $j < q$.

e.g.

X = < e l e p h a n t >

Y = < m o n k e y >

The matching $M = \{ (1,2), (3,3), (6, 5) \}$ refers to the valid matching where we match $x_1, y_2$ and $x_3, y_3$ and $x_6, y_5$. Note that there are no crossings in $M$.

Cost of a matching $M$ is the sum of:

(1) matching cost $m_{x_i,y_j}$ for each $(i,j) \in M$.

(2) gap cost $d$ for each element of $X$ or $Y$ which is not part of $M$.

Solution.

Given $X$ and $Y$, we have 2 choices: either include $(m,n)$ in the optimum matching, or exclude it from the optimum matching.

Case 1:

If $(m,n)$ is part of optimum matching, then the optimum cost would be $m_{x_m, y_n}$ plus the optimum cost of matching the sequences $ and $< y_1, \ldots, y_{n-1}$.

Notation: Let $OPT(i,j)$ denote the optimum cost of matching $$ and $$.

Therefore, we have:

$OPT (m,n) = m_{x_m, y_n} + OPT(m-1, n-1)$

Case 2:

If $(m,n)$ is not part of the optimum matching, then one of the following should hold true (because of the no crossing condition):

(1) $x_m$ is not part of the optimum matching.

or, (2) $y_n$ is not part of the optimum matching.

Therefore, we have:

$OPT(m,n) = min \{ OPT(m-1, n) + d, OPT (m,n-1) + d\}$

—-

Generalizing the above recurrence, we have, for $1 \leq i \leq m$ and $1 \leq j \leq n$:

$OPT (i, j) = min \{ m_{x_i, y_j} + OPT (i-1, j-1), OPT (i-1, j) + d, OPT (i, j-1) + d\}$

Also, we have the base conditions that:

$OPT (i, 0) = i \times d$ for all $0 \leq i \leq m$; and analogously for $OPT (0, j)$.

—-

Pseudocode for computing optimum value:

Initialize OPT [i, 0] and OPT [0,j] as given above.

For i from 1 to m:

For j from 1 to n:

Compute OPT [i,j] using the given recurrence.

End of algorithm.

—-

The above algorithm computes the optimum value $OPT (m,n)$ in $O(mn)$ time.

—-

Computing the optimum solution using OPT [i,j] values:

Let us denote :

(1) $m_{x_i, y_j} + OPT (i-1, j-1)$ by Expr1.

(2) $d + OPT (i, j-1)$ by Expr2.

(3) $d + OPT (i-1, j)$ by Expr3.

Function: OPT_Solution (i,j)

If (i = = 0) OR ( j = = 0)

Return;

If Expr1 <= both Expr2 and Expr3

OPT_Solution (i-1, j-1);

Print: Index i of X is matched to Index j of Y.

Else if: Expr2 <= both Expr1 and Expr3

OPT_Solution (i, j-1);

Else:

OPT_Solution (i-1, j);

End of function.

—-

What we want to compute is OPT_Solution (m, n). The above function computes that in time $O(m+n)$.

—-

Dynamic Programming – 3 : Subset Sum

Subset Sum

We are given $n$ items $\{I_1, \ldots, I_n \}$, each having non-negative integral weight $w_i$. We are also given a non-negative integral weight bound $W$. Our task is to find a valid subset of items which maximizes the total weight of the items in the subset.

Terminology: A valid subset $T$ is one such that the total weight of items in $T$ is at most $W$.

Solution.

For item $I_n$, we have 2 choices : either to include it in the optimum solution, or to not include it.(Note that if $w_n > W$, then we have just one choice  — exclude $I_n$ from the optimum solution.)

Case 1:

If $I_n$ is included in the optimum solution, the optimum solution would be $w_n$ plus the optimum solution obtained from the set of items $\{I_1, \ldots, I_{n-1}\}$ with the weight bound as $W-w_n$.

Notation: Let OPT (i, w) denote the value of the optimum solution obtained on the set $\{I_1, \ldots, I_i\}$ with weight bound $w$.The original task, then, is to compute $OPT(n, W)$.

We have that, if $I_n$ is included in the optimum solution,

$OPT (n, W) = w_n + OPT (n-1, W-w_n)$.

Case 2:

If $I_n$ is not included in the optimum solution, then we have that:

$OPT (n, W) = OPT (n-1, W)$.

—-

Generalizing the above, we have that, for $1 \leq i \leq n$ and $0 \leq w \leq W$:

$OPT ( i, w) = OPT (i-1, w)$ [if $w_i > w$]

Else,

$OPT (i, w) = max \{ w_i + OPT (i-1, w- w_i), OPT (i-1, w) \}$

—-

Also, $OPT (0, w) = 0$ for all $w$.

—-

Pseudocode for computing the optimum value:

Based on the above recurrence, we can construct an algorithm.

For i from 1 to n:

For w from 0 to W:

Compute OPT[i,w] using the above recurrence.

End.

—-

The above code runs in $O(nW)$ time [since O(1) time is spent for computing each OPT[i,w] value].

—-

Computing the Optimum solution:

We can compute the optimum solution recursively given the table of OPT[i,w] values.

Function: Print_OPT (k, w)

If $k == o$

Return;  //Breaking the recursion. We can also have the condition as $k == 0$ OR $w == 0$

If $w_k > w$

Print_OPT (k-1, w);

Else if $w_k + OPT (k-1, w-w_k) > OPT (k-1, w)$

Print_OPT (k-1, w- w_k);

Print $I_k$

Else

Print_OPT (k-1, w);

End of function.

The optimum solution is printed on calling Print_OPT (n, W). The time taken is O(n).

—-

Extended Longest Path problem

These are slides from a talk on the Longest Path Problem at the Penn State Theory seminar.

Talk_longest path

In the Longest Path Problem, we are given an undirected graph, G =(V,E), and integer $k \geq 0$, and are asked to find out whether or not there exists a simple path of length k in G.

The Extended Longest Path Problem is a variant of the above. Here, we are asked to find out all pairs of vertices that have a simple path of length k between them.

Longest Path: Method of Random Orientations

In this post, we shall consider an NP-complete problem, namely that of determining whether a simple path of length k exists in a given graph, G = (V,E).

Formally,

LONGEST PATH PROBLEM

Input: undirected graph, G = (V,E), integer $k \geq 0$.

Parameter: k

To find: Does G contain a simple path of length, k?

Vertex Cover and Set Cover – 4

Question: We are given an instance of the Vertex Cover problem. We are also given an algorithm for the Set Cover problem. Using this algorithm as a black box, solve the given instance of the Vertex Cover problem.

Solution:

Given instance of Vertex Cover problem: undirected graph, G = (V, E); positive integer k.
Question: Does G have a vertex cover of size at most k?

===