Understand the Core of KMP

The core of KMP is to leverage PMT to skip many obviously useless comparisons.

PMT (Partial Match Table) is a preprocessed array which provides some information of pattern string.

PMT

PMT is an integer array which stores the length of the longest proper prefix which is also a suffix of the substring of pattern string from the beginning to the current position.

N.B. Here the prefix and suffix should not be the whole string.

len(PMT) = len(pattern)

For example: pattern = "abcabcd"

index	0	1	2	3	4	5	6
pattern	a	b	c	a	b	c	d
PMT	0	0	0	1	2	3	0

For PMT[4], the substring is “abcab”, in which the longest proper prefix which is also a suffix is “ab”, so PMT[4]=len(“ab”)=2.

How KMP uses PMT

For example: s = "abcabcabcd"

i	0	1	2	3	4	5	6	7	8	9
s	a	b	c	a	b	c	a	b	c	d
	✓	✓	✓	✓	✓	✓	↑
pattern	a	b	c	a	b	c	d
PMT	0	0	0	1	2	3	0

When we are matching pattern with the string s, we found the mismatch at pattern[6], where PMT[6-1] is 3.

This means for the substring before pattern[6], which is "abcabc", the longest proper prefix which is also a suffix is "abc" with length 3.

Then we can skip all the characters of s before current position and skip the first 3 characters of pattern and continue to match s[6] with the pattern[4].

i	0	1	2	3	4	5	6	7	8	9
s	a	b	c	a	b	c	a	b	c	d
				✓	✓	✓	↑
pattern				a	b	c	a	b	c	d

This is because we already have the information that pattern[0:3]==pattern[3:5]==s[3:5].

And we know this is the longest common prefix and suffix of substring pattern[0:6] (i.e. also s[0:6]) so that we are sure that we won’t miss any possible match when we rule out the match starting at s[1] and s[2].

Array `next`

Array next is just for the convenience of implementation. It is just the PMT array with an offset of 1 and we fill the first element with -1.

index	0	1	2	3	4	5	6
pattern	a	b	c	a	b	c	d
PMT	0	0	0	1	2	3	0
next	-1	0	0	0	1	2	3

So when we find the mismatch in the example above at pattern[j] where j=6, we just backtrack to pattern[next[j]] which is pattern[3] and continue to match.

i	0	1	2	3	4	5	6	7	8	9
s	a	b	c	a	b	c	a	b	c	d
	✓	✓	✓	✓	✓	✓	↑
pattern	a	b	c	a	b	c	d
next	-1	0	0	0	1	2	3

Implementation of `getNext()`

The implementation of function getNext is similar to the way KMP uses next. It utilizes the information of its own previous result to calculate the next result.

Golang example:

func getNext(p string) []int {
	next := make([]int, len(p))
	next[0] = -1
	for i, j := 0, -1; i < len(p)-1; {
		if j == -1 || p[i] == p[j] {
			i += 1
			j += 1
			next[i] = j
		} else {
			j = next[j]
		}
	}
	return next
}

Complete Implementation

Golang example:

func KMP(s string, p string) int {
	next := make([]int, len(p))
	next[0] = -1
	for i, j := 0, -1; i < len(p)-1; {
		if j == -1 || p[i] == p[j] {
			i += 1
			j += 1
			next[i] = j
		} else {
			j = next[j]
		}
	}
	
	var i, j int
	for i < len(s) && j < len(p) {
		if j == -1 || s[i] == p[j] {
			i += 1
			j += 1
		} else {
			j = next[j]
		}
	}
	if j == len(p) {
		return i - j
	}
	return -1
}

Licensed under CC BY-SA 4.0

Understand the Core of KMP

PMT

How KMP uses PMT

Array next

Implementation of getNext()

Complete Implementation

Comments

Array `next`

Implementation of `getNext()`