Longest Common Subsequence(LCS)

#Day 2

Today’s algorithm is longest common subsequence(LCS). This algorithm finds a string of maximal length which is a common subsequence of two given string.

Example: Longest common subsequence between RANDOM and RADIUM is RADM.

This algorithm is somewhat similar to edit distance algorithm. 

As my previous post on edit distance, this problem too can be solved using dynamic programming in a bottom-up manner.

For this algorithm, we will take two string

stringA = ‘TRAKS’

stringB = ‘TRUCK’

We will create a table of size((n+1)*(m+1)), where n is the length of stringA and m is the length of stringB and initialize the first row and first column of the same with 0.(Image: 1)

lcs
Image 1

Values of row o in the above table state that longest common sequence between

‘T’ and ‘ ‘ is 0

‘TR’ and ‘ ‘ is 0

‘TRA’ and ‘ ‘ is 0 and so on

In the same manner, column 0 states that LCS between

‘ ‘ and ‘T’ is 0

‘ ‘ and ‘TR’ is 0

‘ ‘ and ‘TRU’ is 0

and so on

Then we start filling rest of the table row by row. Rules for doing so are as-

  • if stringA[column_index]== stringB[row_index]
    • table[row_index][column_index] = table[row_index-1][column_index-1]+1

      lcs1
      Image 2
  • if stringA[column_index]!= stringB[row_index]
    • table[row_index][column_index] = max(table[row_index-1][column_index],table[row_index][column_index-1])

      lcs2
      Image 3

Final table which will be constructed is as image 4

lcs4
Image 4

Length of the LCS is table[last_row][last_column].

For the construction of the string, we do the back-tracking on the table starting from the right bottom value. In this example, it is table[5] [5]

Initialize and empty string as LCS

Rules for trace backing are-

  •  if stringA[column_index]== stringB[row_index]
    • add stringA[column_index] to the start of LCS
    • move back to table[row_index-1][column_index-1]
  • else
    • move back to max(table[row_index-1][column_index],table[row_index][column_index-1])
    • in case table[row_index-1][column_index]==table[row_index][column_index-1] we can move to table[row_index-1][column_index] or table[row_index][column_index-1]

According to this back-tracing LCS comes as TRK with a length of 3.

Time Complexity

Time complexity of this algorithm is O(n*m). Where n is the length of string A and m is the length of string B.

Source Code

Source code for the implementation of this problem can be found here. source code link

Hackerrank Problem

If you want to implement and check the result you can go to this link to solve the problem based on the same algorithm.

Credits

Implementation of this algorithm is inspired by Tushar Roy’s Youtube channel .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s