The core of KMP is to leverage PMT to skip many obviously useless comparisons.

PMT (Partial Match Table) is a preprocessed array which provides some information of pattern string.

PMT

PMT is an integer array which stores the length of the longest proper prefix which is also a suffix of the substring of pattern string from the beginning to the current position.

N.B. Here the prefix and suffix should not be the whole string.

len(PMT) = len(pattern)

For example: pattern = "abcabcd"

index 0 1 2 3 4 5 6
pattern a b c a b c d
PMT 0 0 0 1 2 3 0

For PMT[4], the substring is “abcab”, in which the longest proper prefix which is also a suffix is “ab”, so PMT[4]=len(“ab”)=2.

How KMP uses PMT

For example: s = "abcabcabcd"

i 0 1 2 3 4 5 6 7 8 9
s a b c a b c a b c d
       
pattern a b c a b c d      
PMT 0 0 0 1 2 3 0      

When we are matching pattern with the string s, we found the mismatch at pattern[6], where PMT[6-1] is 3.

This means for the substring before pattern[6], which is "abcabc", the longest proper prefix which is also a suffix is "abc" with length 3.

Then we can skip all the characters of s before current position and skip the first 3 characters of pattern and continue to match s[6] with the pattern[4].

i 0 1 2 3 4 5 6 7 8 9
s a b c a b c a b c d
             
pattern       a b c a b c d

This is because we already have the information that pattern[0:3]==pattern[3:5]==s[3:5].

And we know this is the longest common prefix and suffix of substring pattern[0:6] (i.e. also s[0:6]) so that we are sure that we won’t miss any possible match when we rule out the match starting at s[1] and s[2].

Array next

Array next is just for the convenience of implementation. It is just the PMT array with an offset of 1 and we fill the first element with -1.

index 0 1 2 3 4 5 6
pattern a b c a b c d
PMT 0 0 0 1 2 3 0
next -1 0 0 0 1 2 3

So when we find the mismatch in the example above at pattern[j] where j=6, we just backtrack to pattern[next[j]] which is pattern[3] and continue to match.

i 0 1 2 3 4 5 6 7 8 9
s a b c a b c a b c d
       
pattern a b c a b c d      
next -1 0 0 0 1 2 3      

Implementation of getNext()

The implementation of function getNext is similar to the way KMP uses next. It utilizes the information of its own previous result to calculate the next result.

Golang example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
func getNext(p string) []int {
	next := make([]int, len(p))
	next[0] = -1
	for i, j := 0, -1; i < len(p)-1; {
		if j == -1 || p[i] == p[j] {
			i += 1
			j += 1
			next[i] = j
		} else {
			j = next[j]
		}
	}
	return next
}

Complete Implementation

Golang example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
func KMP(s string, p string) int {
	next := make([]int, len(p))
	next[0] = -1
	for i, j := 0, -1; i < len(p)-1; {
		if j == -1 || p[i] == p[j] {
			i += 1
			j += 1
			next[i] = j
		} else {
			j = next[j]
		}
	}
	
	var i, j int
	for i < len(s) && j < len(p) {
		if j == -1 || s[i] == p[j] {
			i += 1
			j += 1
		} else {
			j = next[j]
		}
	}
	if j == len(p) {
		return i - j
	}
	return -1
}

Creative Commons License Licensed under CC BY-SA 4.0

Comments