0
Explore
0

Knuth–Morris–Pratt (KMP) String Matching Algorithm

Updated on October 23, 2025

The Knuth–Morris–Pratt (KMP) Algorithm is an efficient way to find a pattern inside a text. It was designed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977.

Unlike the Naïve String Matching Algorithm, which rechecks already matched characters after a mismatch, KMP avoids unnecessary comparisons by using previous match information.
This makes it much faster — especially for large strings.

In simple words, KMP uses a smart shifting strategy. When a mismatch happens, it uses the LPS (Longest Prefix which is also a Suffix) array to decide where to resume matching — instead of restarting from the next character.

Quescol 1-Minute Read

The Knuth–Morris–Pratt (KMP) Algorithm, designed by Knuth, Morris, and Pratt in 1977, is an efficient pattern matching algorithm used to find occurrences of a smaller string (pattern) within a larger string (text). Unlike the naive approach, which rechecks already matched characters after a mismatch, KMP uses a Longest Prefix Suffix (LPS) array to skip unnecessary comparisons, making it much faster, especially for large texts.

The algorithm works in two main steps:

  • First, it preprocesses the pattern to build the LPS array, which indicates how many characters can be skipped after a mismatch.
  • Second, it scans the text using the LPS array to efficiently find matches without restarting comparisons.

KMP is particularly useful for repeated searches, handling repeated patterns, and minimizing CPU usage.

Complexity of Knuth–Morris–Pratt (KMP) Algorithm:

Let’s Understand in Depth

What is KMP Algorithm?

The Knuth–Morris–Pratt (KMP) Algorithm is a pattern matching algorithm that searches for occurrences of a pattern string within a text string. It uses information from previously matched characters to avoid redundant comparisons, making it more efficient than the naive approach.

The algorithm works in two main steps:

  • Using the LPS Array: This array helps the algorithm determine how many characters can be skipped when a mismatch occurs, preventing unnecessary rechecks.
  • Preprocessing the Pattern: It creates a helper array called the LPS array (Longest Proper Prefix which is also a Suffix).

Key Concepts:

  • Prefix: Beginning part of the string.
  • Suffix: Ending part of the string.
  • Proper prefix/suffix: Prefix or suffix that is not equal to the entire string.

For example,

For "ABAB",

  • Prefixes → A, AB, ABA
  • Suffixes → B, AB, BAB

The LPS value at each position helps the algorithm skip unnecessary comparisons.

Working Procedure of KMP (Step-by-Step)

Step 1: Preprocess the Pattern

Before searching, KMP computes an LPS array for the pattern. This LPS array tells us how much we can skip if a mismatch occurs. (Below We will see in details how to calculate LPS array)

Example:

Pattern = "ABABAC"
LPS = [0, 0, 1, 2, 3, 0]

This means:

  • For position 0 (A): no prefix = 0
  • For position 3 (B): longest prefix-suffix = 2 (“AB”)

Step 2: Pattern Searching

Now we compare the pattern with the text:

  1. Start comparing from the beginning.
  2. If characters match, move both pointers (text and pattern).
  3. If mismatch occurs:
    • Do not move the text pointer backward.
    • Move the pattern pointer using the LPS array value.
  4. Continue until the full text is scanned.

This way, each character in the text is processed only once.

Example of KMP String Matching Algorithm

Lets see below example to understand how KMP String Matching Algorithm works.

Text (T): ABABDABACDABABCABAB
Pattern (P): ABABCABAB

Compute LPS Array

Pattern: A B A B C A B A B
LPS Array: 0 0 1 2 0 1 2 3 4

Step-by-step LPS Array calculation :

IndexCharacterSubstring (0..i)Longest Proper Prefix = SuffixLPS[i]Explanation
0AA0Single character → no prefix/suffix
1BAB0No match between prefix & suffix
2AABA“A”1Prefix “A” = Suffix “A”
3BABAB“AB”2Prefix “AB” = Suffix “AB”
4CABABC0No prefix equals suffix
5AABABCA“A”1Prefix “A” = Suffix “A”
6BABABCAB“AB”2Prefix “AB” = Suffix “AB”
7AABABCABA“ABA”3Prefix “ABA” = Suffix “ABA”
8BABABCABAB“ABAB”4Prefix “ABAB” = Suffix “ABAB”

Step-by-Step KMP Pattern Matching Process

We’ll use:

  • i → index for text
  • j → index for pattern

Both start from 0.

Step 1: Start comparing

Text:    A B A B D A B A C D A B A B C A B A B
Pattern: A B A B C A B A B
          ↑
i=0, j=0

Characters match → move both i and j forward.

Step 2–4: Continue matching

Text:    A B A B D ...
Pattern: A B A B C ...
                 ↑
i=4, j=4

❌ Mismatch occurs (pattern has C, text has D).

→ Set j = LPS[j-1] = LPS[3] = 2
→ Do not move text pointer back (i stays 4)

Step 5: Resume comparison from i=4, j=2

Text:    A B A B D A ...
Pattern:       A B A B C ...
                   ↑

❌ Mismatch again → since j > 0, set j = LPS[j-1] = LPS[1] = 0.

Then move i → 5.

Step 6–10: Keep comparing

You’ll find partial matches but mismatches will cause j to reset using LPS logic — skipping rechecks.

Step 11: Full Match Found

When i reaches index 18 in text:

Text:    A B A B D A B A C D A B A B C A B A B
Pattern:                 A B A B C A B A B
                          ↑ (Starts at index 10)

All characters of the pattern match → j = 9 (pattern length)

So, pattern found at index 10 in the text 🎯

Why It’s Efficient?

  • No backtracking in the text — only j moves intelligently using LPS.
  • LPS lets KMP reuse previous match info instead of restarting every time.

Summary:

StepActionResult
1Start with i=0, j=0Compare pattern with text
2Match → move bothi++, j++
3Mismatch → use LPSj = LPS[j-1]
4Continue until end of textFind matches
5Match foundIndex = 10

Algorithm

Algorithm for KMP String Matching

Input:

  • Text T of length n
  • Pattern P of length m

Output:

  • All starting positions where P appears in T.

Step 1: Build LPS Array

  1. Initialize an array lps[m] with all zeros.
  2. Set len = 0 (length of the previous longest prefix-suffix).
  3. For i = 1 to m-1:
    • If P[i] == P[len], increment len and set lps[i] = len.
    • Else, if len != 0, set len = lps[len - 1].
    • Else, set lps[i] = 0 and increment i.

Step 2: Search the Pattern

  1. Initialize i = 0 (index for text) and j = 0 (index for pattern).
  2. While i < n:
    • If P[j] == T[i]: increment both i and j.
    • If j == m: print “Pattern found at index i – j”; set j = lps[j - 1].
    • Else if i < n and P[j] != T[i]:
      • If j != 0, set j = lps[j - 1]; else increment i.

Java Program to implement Knuth–Morris–Pratt (KMP) String Matching Algorithm

public class KMPAlgorithm {

    // Function to compute LPS array
    void computeLPSArray(String pattern, int m, int[] lps) {
        int length = 0;
        int i = 1;
        lps[0] = 0; // first LPS is always 0

        while (i < m) {
            if (pattern.charAt(i) == pattern.charAt(length)) {
                length++;
                lps[i] = length;
                i++;
            } else {
                if (length != 0) {
                    length = lps[length - 1];
                } else {
                    lps[i] = 0;
                    i++;
                }
            }
        }
    }

    // Function to perform KMP search
    void KMPSearch(String pattern, String text) {
        int m = pattern.length();
        int n = text.length();
        int[] lps = new int[m];

        // Preprocess pattern
        computeLPSArray(pattern, m, lps);

        int i = 0; // index for text
        int j = 0; // index for pattern

        while (i < n) {
            if (pattern.charAt(j) == text.charAt(i)) {
                i++;
                j++;
            }

            if (j == m) {
                System.out.println("Pattern found at index " + (i - j));
                j = lps[j - 1];
            } else if (i < n && pattern.charAt(j) != text.charAt(i)) {
                if (j != 0)
                    j = lps[j - 1];
                else
                    i++;
            }
        }
    }

    // Main method
    public static void main(String[] args) {
        String text = "ABABDABACDABABCABAB";
        String pattern = "ABABCABAB";
        KMPAlgorithm kmp = new KMPAlgorithm();
        kmp.KMPSearch(pattern, text);
    }
}

Output

Pattern found at index 10

Explanation

The KMP (Knuth–Morris–Pratt) Algorithm is an efficient way to search for a smaller string (pattern) inside a larger string (text) without re-checking characters unnecessarily. The algorithm uses two main parts in the code: computeLPSArray() and KMPSearch().

The computeLPSArray() function creates an LPS array (Longest Prefix Suffix) for the pattern. The LPS array stores, for each position in the pattern, the length of the longest proper prefix which is also a suffix. This array is essential because it helps the algorithm know where to resume comparison after a mismatch, avoiding repeated work. In the code, a variable length tracks the current longest prefix-suffix, and i moves through the pattern to fill the LPS array.

The KMPSearch() function performs the actual searching. Two variables, i and j, are used as indexes — i for the text and j for the pattern. The algorithm compares pattern.charAt(j) with text.charAt(i). If they match, both i and j are incremented. If j reaches the length of the pattern (m), it means a full match is found, and the index (i - j) is printed. If a mismatch occurs, j is updated using lps[j - 1] instead of resetting to zero, and i continues through the text. This way, the search efficiently skips characters that have already been matched.

Time & Space Complexity of KMP Algorithm

Time Complexity

CaseTime Complexity
Best CaseO(n)
Average CaseO(n)
Worst CaseO(n)

Space Complexity

CaseSpace Complexity
Best CaseO(m)
Average CaseO(m)
Worst CaseO(m)

Advantages of KMP (Knuth–Morris–Pratt) Algorithm

  • Much faster than the Naïve algorithm: Unlike the naive approach that may repeatedly start comparisons from scratch, KMP avoids unnecessary rechecks, making it significantly faster, especially for long texts.
  • Doesn’t recheck already matched characters: KMP uses the Longest Prefix Suffix (LPS) array to remember which part of the pattern has already been matched, so it doesn’t waste time comparing characters that are already known to match.
  • Works efficiently on large text data: Because it avoids redundant comparisons, KMP can handle very large texts without a dramatic increase in computation time, unlike naive methods that may slow down considerably.
  • Time complexity is linear (O(m + n)): Here, m is the length of the pattern and n is the length of the text. The preprocessing step to build the LPS array takes O(m), and the actual search in the text takes O(n), giving an overall O(m + n) efficiency.
  • Easy to implement once LPS concept is clear: The main complexity lies in understanding the LPS array, but once that is mastered, the algorithm’s implementation is straightforward and systematic.
  • Useful for multiple pattern searches: KMP is very effective when searching for a pattern multiple times within the same text because it doesn’t restart the comparison unnecessarily.
  • Can handle repeated patterns easily: Patterns with repeated prefixes or suffixes are handled naturally using the LPS array, avoiding redundant comparisons of repeated sequences.
  • Saves CPU time and improves performance: By minimizing redundant comparisons and efficiently traversing the text, KMP reduces CPU usage and significantly improves performance over simpler matching methods, especially for large-scale text processing.

Disadvantages of KMP (Knuth–Morris–Pratt) Algorithm

  • Slightly more complex to understand initially: Unlike the naive algorithm, KMP relies on the concept of the Longest Prefix Suffix (LPS) array, which can be tricky for beginners to grasp. Understanding how the LPS array helps skip unnecessary comparisons requires careful study.
  • Requires preprocessing of the pattern: Before searching the text, KMP first builds the LPS array for the pattern. This preprocessing step adds an extra initial computation, unlike naive matching which starts searching immediately.
  • Needs additional space for the LPS array: To store the LPS values, KMP requires extra memory proportional to the length of the pattern (O(m) space). For very large patterns, this can slightly increase memory usage.
  • Harder to debug compared to the Naïve approach: Because KMP involves both preprocessing and careful tracking of indices during the search, debugging mistakes can be more challenging than in the naive method where each comparison is straightforward.
  • Overhead for very small patterns: For very short patterns, the benefits of KMP are minimal, and the overhead of building the LPS array might outweigh its performance gain. In such cases, a simple naive approach can be faster.
  • Implementation errors are common among beginners: Beginners often make mistakes in calculating the LPS array or updating indices during mismatches, which can lead to incorrect results or infinite loops.
  • Cannot handle wildcard patterns directly: KMP is designed for exact matches only. Patterns containing wildcards (like ? or *) cannot be searched without modifying the algorithm.

Conclusion

The Knuth–Morris–Pratt (KMP) Algorithm is a fast and efficient way to search for a pattern within a text. By using the LPS (Longest Prefix Suffix) array, it avoids unnecessary comparisons, making it much faster than the naive approach, especially for large texts or repeated searches. While it requires extra preprocessing and a little more understanding initially, KMP saves time, handles repeated patterns efficiently, and is widely used in applications where quick and reliable string matching is needed.