The core of KMP is to leverage PMT to skip many obviously useless comparisons.
PMT (Partial Match Table) is a preprocessed array which provides some information of pattern string.
PMT
PMT is an integer array which stores the length of the longest proper prefix which is also a suffix of the substring of pattern string from the beginning to the current position.
N.B. Here the prefix and suffix should not be the whole string.
len(PMT) = len(pattern)
For example: pattern = "abcabcd"
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
pattern | a | b | c | a | b | c | d |
PMT | 0 | 0 | 0 | 1 | 2 | 3 | 0 |
For PMT[4], the substring is “abcab”, in which the longest proper prefix which is also a suffix is “ab”, so PMT[4]=len(“ab”)=2.
How KMP uses PMT
For example: s = "abcabcabcd"
i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
s | a | b | c | a | b | c | a | b | c | d |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ↑ | ||||
pattern | a | b | c | a | b | c | d | |||
PMT | 0 | 0 | 0 | 1 | 2 | 3 | 0 |
When we are matching pattern
with the string s
, we found the mismatch at pattern[6]
, where PMT[6-1]
is 3.
This means for the substring before pattern[6]
, which is "abcabc"
, the longest proper prefix which is also a suffix is "abc"
with length 3.
Then we can skip all the characters of s
before current position and skip the first 3 characters of pattern
and continue to match s[6]
with the pattern[4]
.
i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
s | a | b | c | a | b | c | a | b | c | d |
✓ | ✓ | ✓ | ↑ | |||||||
pattern | a | b | c | a | b | c | d |
This is because we already have the information that pattern[0:3]
==pattern[3:5]
==s[3:5]
.
And we know this is the longest common prefix and suffix of substring pattern[0:6]
(i.e. also s[0:6]
) so that we are sure that we won’t miss any possible match when we rule out the match starting at s[1]
and s[2]
.
Array next
Array next
is just for the convenience of implementation.
It is just the PMT array with an offset of 1 and we fill the first element with -1.
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
pattern | a | b | c | a | b | c | d |
PMT | 0 | 0 | 0 | 1 | 2 | 3 | 0 |
next | -1 | 0 | 0 | 0 | 1 | 2 | 3 |
So when we find the mismatch in the example above at pattern[j]
where j=6, we just backtrack to pattern[next[j]]
which is pattern[3]
and continue to match.
i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
s | a | b | c | a | b | c | a | b | c | d |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ↑ | ||||
pattern | a | b | c | a | b | c | d | |||
next | -1 | 0 | 0 | 0 | 1 | 2 | 3 |
Implementation of getNext()
The implementation of function getNext
is similar to the way KMP uses next
. It utilizes the information of its own previous result to calculate the next result.
Golang example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
func getNext(p string) []int {
next := make([]int, len(p))
next[0] = -1
for i, j := 0, -1; i < len(p)-1; {
if j == -1 || p[i] == p[j] {
i += 1
j += 1
next[i] = j
} else {
j = next[j]
}
}
return next
}
Complete Implementation
Golang example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
func KMP(s string, p string) int {
next := make([]int, len(p))
next[0] = -1
for i, j := 0, -1; i < len(p)-1; {
if j == -1 || p[i] == p[j] {
i += 1
j += 1
next[i] = j
} else {
j = next[j]
}
}
var i, j int
for i < len(s) && j < len(p) {
if j == -1 || s[i] == p[j] {
i += 1
j += 1
} else {
j = next[j]
}
}
if j == len(p) {
return i - j
}
return -1
}
Licensed under CC BY-SA 4.0
Comments