KMP algorithm is used in finding a substring in a given string. Or we can say it searches for the occurrences of a “word” in a “text string”.
KMP based on dynamic programming(memoize the last part). In the normal scenario with brute forcing, we can get the searched string in O(n*m) where n is the text length and m is the searched word’s length. For example-
Text String -> ABAAABAA
Search String -> AABA
If we take indexes as i=0 and j=0 from both the strings then the comparison will go something like
text[i] compare string[j]=> A compare A (i=0,j=0)
text[i] compare string[j]=> B compare A (i=1,j=1)
text[i] compare string[j]=> B compare A (i=1,j=0)
text[i] compare string[j]=> A compare A (i=2,j=0)
text[i] compare string[j]=> A compare A (i=3,j=1)
text[i] compare string[j]=> A compare B (i=4,j=2)
text[i] compare string[j]=> A compare A (i=3,j=0)
……. and so on untill we find the match . In this method we have to start from the start of searched string. Hence it is slower and O(n*m) . This same problem can be solved in O(n+m) using KMP algorithm.
In this algorithm, we need to pre-calculate for searched string. If there exist a suffix which is also a prefix at any given index.
ABCAB at index 4(count from 0) has AB as the suffix which is also a prefix of the string. So if we are matching
ABCABD against ABCABCABD when i =5 and j=5 we get C compare with D
Then instead of going back to j=0 in the searched string we go back to j =2 as we have a suffix of length 2 (AB ) which is also a prefix that means AB is already matched in the text string. And we don’t change i.
Then we move on with the same algorithm
So we get that the searched string exists in the text string at index 3.
So to precalculate the prefix-suffix table for the searched string we use the following algorithm-
initialize an array of size n (length of searched string) with 0.
we take 2 counters i=0 and j=1
else if i =0
i = array[i-1]
array = *length_of_string first_index = 0 second_index = 1 while second_index<length_of_string: if s[first_index]==s[second_index]: array[second_index] = first_index+1 first_index += 1 second_index += 1 else: if first_index==0: second_index += 1 else: first_index = array[first_index-1]
Using above code we get the following working on searched string
From above table we get that at index 3 there is a suffix of length 1 which is the prefix as well. At index 4 we have that of length 2 and rest we have as 0.
The time complexity of this algorithm is O(n+m) as we traverse the text string only once and the searched string is traversed in linear time.
Source code for this algorithm can be found here.