Question

dynamic_programming

תת-מחרוזת משותפת ארוכה ביותר - יהיו w1=a1a1...an,w2=b1b2...bmw_1=a_1a_1...a_n , w_2=b_1b_2...b_m מחרוזות מעל אותו אלפבית, תת מחרוזת היא תת סדרה עולה של אותיות מתוך מחרוזת. (עולה ביחס לאינדקסים), הציעו אלגוריתם שמחזיר את אורך תת המחרוזת המשותפת (תת המחרוזת המופיעה בשתי המחרוזות) הארוכה ביותר הארוכה ביותר וכן את תת המחרוזת עצמה.

1

Answers

Lets us define the next recursive expression: S(i,j)S(i,j) the longest common sub-string of a1a2...ai,b1b2...bja_1a_2...a_i , b_1b_2...b_j. So our solution will be S(n,m)S(n,m)

  • S(0,j)=S(i,0)=0S(0,j) = S(i,0) = 0 - A definition for empty words.
  • S(i,j)=1+S(i1,j1)S(i,j) = 1+S(i-1, j-1) in case ai=aja_i = a_j or maxS(i1,j),S(i,j1)max{S(i-1,j) , S(i,j-1)} otherwise.

algorithm:

  1. Start an n+1×m+1n + 1\times m+1 matrix full with 00's.
  2. Fill the matrix diagonally (right to left diagonals starting at S(1,1)S(1,1) then S(1,2),S(2,1)S(1,2) , S(2,1), then S(1,3),S(2,2),S(3,2)S(1,3) , S(2,2), S(3,2) and do forth ) using the S(i,j)S(i,j) expressions (pay attention that S(0,j),S(i,0)S(0,j) , S(i,0) are already filled).In addition: at each time when S(i,j)=1+(S(i1),(j1))S(i,j) = 1+(S(i-1),(j-1)) save the pair (i,j)(i,j) (those pairs are the sub-string's indexes in w1,w2w_1 , w_2 respectively).
  3. Return S(n,m)S(n,m), and the sub-string formed from the saved indexes.

Complexity O(nm)O(nm) time, O(max{m,n})O(max\{m,n\}) space.

Correctness:

  • Let us prove by induction that for all (i,j)(i,j) , S(i,j)S(i,j) values is correct.
  • Base case: i=0i = 0 so, for all j[m]j\in [m] the longest common sub-string is the empty string from length 00, and S(0,j)S(0,j) is defined to be 00 as needed.
  • Assumption: for all 1i<k1\le i < k and *for all* j[m]j\in [m] S(i,j)S(i,j) holds the correct values.
  • Step: (now we start an induction over j values when i=ki=k through the whole induction process).
    • Base: j=0j=0 as explained before the longest common sub-string is from length 00, and S(k,0)=0S(k,0) = 0 (as defined).
    • Assumption : for all 1j<l1\le j < l we assume S(k,j)S(k,j) holds the right value.
    • Step:
      1. In case ak̸=bla_k \neq b_l: the longest common sub-string can't include both ak,bka_k,b_k so the longest common sub-string is the maximal between the longest sub-string formed from a1...ak,b1...bl1a_1...a_k, b_1...b_{l-1} or from a1...ak1,b1...bla_1...a_{k-1} , b_1...b_l. In this case, the induction's assumption promises S(k,l1),S(k1,l)S(k,l-1) , S(k-1,l) holds the correct values, so S(k,l)=max{S(k,l1),S(k1,l)}S(k,l) = max\{S(k,l-1) , S(k-1,l)\} is the correct value.
      2. In case ak=bla_k = b_l, let us assume there exists a common sub-string SS such that S>S(k1,l1)+1|S| > S(k-1,l-1) +1 -:
        • in case SS is formed from the indexes k,lk,l from w1,w2w_1 ,w_2 respectively, then S=S(ak,bl)S = S' \cup (a_k,b_l) when SS' is formed from the indexes [k1],[l1][k-1],[l-1] from our induction assumption we know that S(k1,l1)>SS(k-1,l-1) > |S'| therefore SS(k1,l1)+1|S'|\le S(k-1,l-1)+1 which contradicts the assumption.
        • in case SS' is formed from kk and not from ll (means a couple for aka_k is taken from b1b2...bl1b_1b_2...b_{l-1} then S<S(k1,l)|S'| < S(k-1,l) by the induction assumption, Now we need to make the last observation: S(k1,l)S(k1,l1)+1S(k-1,l) \le S(k-1,l-1)+1, this holds due to the fact that in S(k1,l)S(k-1,l) one may take at list one indexes pair (ai,bj)(a_i,b_j) which doesn't appear in S(k1,l1)S(k-1,l-1) which is an indexes pair of the form (i,l)(i,l), thus by adding 11 to S(k1,l1)S(k-1,l-1) we may bound the difference of S(k1,l1),S(k1,l)S(k-1,l-1), S(k-1,l). So in conclusion S<S(k1,l)S(k1,l1)+1=S(i,j)|S'| < S(k-1,l) \le S(k-1,l-1)+1 = S(i,j) so this contradicts SS' existence.
        • The case of SS' is formed from ll but not from kk is symmetric.

This finishes the prove.

0

Lets us define the next recursive expression: S(i,j)S(i,j) the longest common sub-string of a1a2...ai,b1b2...bja_1a_2...a_i , b_1b_2...b_j. So our solution will be S(n,m)S(n,m)

  • S(0,j)=S(i,0)=0S(0,j) = S(i,0) = 0 - A definition for empty words.
  • S(i,j)=1+S(i1,j1)S(i,j) = 1+S(i-1, j-1) in case ai=aja_i = a_j or maxS(i1,j),S(i,j1)max{S(i-1,j) , S(i,j-1)} otherwise.

algorithm:

  1. Start an n+1×m+1n + 1\times m+1 matrix full with 00's.
  2. Fill the matrix diagonally (right to left diagonals starting at S(1,1)S(1,1) then S(1,2),S(2,1)S(1,2) , S(2,1), then S(1,3),S(2,2),S(3,2)S(1,3) , S(2,2), S(3,2) and do forth ) using the S(i,j)S(i,j) expressions (pay attention that S(0,j),S(i,0)S(0,j) , S(i,0) are already filled).In addition: at each time when S(i,j)=1+(S(i1),(j1))S(i,j) = 1+(S(i-1),(j-1)) save the pair (i,j)(i,j) (those pairs are the sub-string's indexes in w1,w2w_1 , w_2 respectively).
  3. Return S(n,m)S(n,m), and the sub-string formed from the saved indexes.

Complexity O(nm)O(nm) time, O(max{m,n})O(max\{m,n\}) space.

Correctness:

  • Let us prove by induction that for all (i,j)(i,j) , S(i,j)S(i,j) values is correct.
  • Base case: i=0i = 0 so, for all j[m]j\in [m] the longest common sub-string is the empty string from length 00, and S(0,j)S(0,j) is defined to be 00 as needed.
  • Assumption: for all 1i<k1\le i < k and *for all* j[m]j\in [m] S(i,j)S(i,j) holds the correct values.
  • Step: (now we start an induction over j values when i=ki=k through the whole induction process).
    • Base: j=0j=0 as explained before the longest common sub-string is from length 00, and S(k,0)=0S(k,0) = 0 (as defined).
    • Assumption : for all 1j<l1\le j < l we assume S(k,j)S(k,j) holds the right value.
    • Step:
      1. In case ak̸=bla_k \neq b_l: the longest common sub-string can't include both ak,bka_k,b_k so the longest common sub-string is the maximal between the longest sub-string formed from a1...ak,b1...bl1a_1...a_k, b_1...b_{l-1} or from a1...ak1,b1...bla_1...a_{k-1} , b_1...b_l. In this case, the induction's assumption promises S(k,l1),S(k1,l)S(k,l-1) , S(k-1,l) holds the correct values, so S(k,l)=max{S(k,l1),S(k1,l)}S(k,l) = max\{S(k,l-1) , S(k-1,l)\} is the correct value.
      2. In case ak=bla_k = b_l, let us assume there exists a common sub-string SS such that S>S(k1,l1)+1|S| > S(k-1,l-1) +1 -:
        • in case SS is formed from the indexes k,lk,l from w1,w2w_1 ,w_2 respectively, then S=S(ak,bl)S = S' \cup (a_k,b_l) when SS' is formed from the indexes [k1],[l1][k-1],[l-1] from our induction assumption we know that S(k1,l1)>SS(k-1,l-1) > |S'| therefore SS(k1,l1)+1|S'|\le S(k-1,l-1)+1 which contradicts the assumption.
        • in case SS' is formed from kk and not from ll (means a couple for aka_k is taken from b1b2...bl1b_1b_2...b_{l-1} then S<S(k1,l)|S'| < S(k-1,l) by the induction assumption, Now we need to make the last observation: S(k1,l)S(k1,l1)+1S(k-1,l) \le S(k-1,l-1)+1, this holds due to the fact that in S(k1,l)S(k-1,l) one may take at list one indexes pair (ai,bj)(a_i,b_j) which doesn't appear in S(k1,l1)S(k-1,l-1) which is an indexes pair of the form (i,l)(i,l), thus by adding 11 to S(k1,l1)S(k-1,l-1) we may bound the difference of S(k1,l1),S(k1,l)S(k-1,l-1), S(k-1,l). So in conclusion S<S(k1,l)S(k1,l1)+1=S(i,j)|S'| < S(k-1,l) \le S(k-1,l-1)+1 = S(i,j) so this contradicts SS' existence.
        • The case of SS' is formed from ll but not from kk is symmetric.

This finishes the prove.

0
Feedback