0
Explore
0

Rabin-Karp Algorithm: A Complete Beginner’s Guide

Updated on October 26, 2025

The Rabin-Karp algorithm is a well-known and widely used method in computer science for finding a substring (pattern) inside a larger string (text). Unlike naive string matching, which checks every possible position character by character, Rabin-Karp uses a mathematical hash function to quickly compare strings.

Hashing is a method of converting a string into a numerical value (hash value). By comparing hash values first, the algorithm can avoid checking characters unnecessarily, making the search faster. When hash values match, only then the algorithm checks characters for an exact match, avoiding unnecessary comparisons.

Quescol 1-Minute Read

The Rabin-Karp algorithm is a string searching method used to find a pattern inside a larger text efficiently using hashing. Instead of checking every character like the naive approach, it converts strings into numeric hash values. By comparing hashes first, the algorithm avoids unnecessary character comparisons. Only when the hashes match does it compare the actual characters to confirm the match.

This approach makes searching faster and more efficient, especially when dealing with large texts or multiple patterns. The use of a rolling hash allows the algorithm to quickly update the hash value for the next substring without recalculating everything from scratch, which further saves time.

Key Points about Rabin-Karp Algorithm

  • Pattern (P): The substring we want to find.
  • Text (T): The larger string where we search for the pattern.
  • Hash Function: Converts strings to numbers for quick comparison.
  • Rolling Hash: Updates the hash efficiently when sliding the pattern over the text.
  • Time Complexity:
    • Average case: O(n + m)
    • Worst case: O(n × m)
      where n = length of text, m = length of pattern.
  • Best Use Case: Works best when searching for multiple patterns in the same text or when the text is large.
  • Example Applications: Used in plagiarism detection, search engines, and text matching tools.

Let’s Understand in Depth

What is Rabin-Karp algorithm ?

Rabin-Karp algorithm is a pattern searching algorithm that uses a rolling hash function to efficiently search for a pattern string P of length m inside a text T of length n.

Important Terms in the Rabin-Karp Algorithm

1). Pattern (P): The smaller string that we are trying to find inside the text.
Example: In text "abcdeabc", the pattern "abc" is what we want to search for.

2). Text (T): The larger string where we are searching for the pattern.
Example: In "abcdeabc", this whole string is the text.

3). Hash Function: A mathematical function that converts a string into a numeric value (hash). This helps compare strings faster because comparing numbers is quicker than comparing characters.

4). Rolling Hash: A special type of hash that lets the algorithm quickly calculate the hash of the next substring by using the previous hash value. This avoids recalculating everything from scratch and saves time.

Algorithm

1. Compute the hash of the pattern (P):

Convert the pattern string into a numerical hash value using a hash function. This will be used to quickly compare the pattern with substrings of the text.

2. Compute the hash of the first substring of the text:

Take the first substring of the text with the same length as the pattern and compute its hash value. This is the initial comparison point.

3. Slide the pattern over the text one character at a time:

  • Compare hash values: Check if the hash of the current text substring matches the hash of the pattern.
  • Check characters if hashes match: If the hashes are equal, compare the characters one by one to confirm an exact match, since different strings can sometimes have the same hash (collision).
  • Record the position: If the substring matches the pattern, note the starting index of the match.
  • Update the hash: Use a rolling hash to efficiently calculate the hash of the next substring by removing the first character and adding the next character in the text.

4. Repeat: Continue sliding the window and performing the above steps until the end of the text is reached.

Example of Rabin-Karp Algorithm

Let’s understand the Rabin-Karp algorithm step by step with the example below:

Text (T): “ABCCDDAEFG”
Pattern (P): “CDD”

Step 1: Find lengths

  • Length of the text, n = 10
  • Length of the pattern, m = 3
    We will compare every substring of length 3 in the text with the pattern “CDD”.

Step 2: Idea Behind Hash Function

The hash function in Rabin–Karp converts a string (like “CDD”) into a number, so we can compare numbers instead of characters — which is much faster.

The formula for hash is usually:

hash(S)= (S[0]×dm−1+S[1]×dm−2+S[2]×dm−3+…+S[m−1])mod  q

where:

  • d = number of possible characters (like 26 for alphabets or 256 for ASCII)
  • q = a large prime number (used to avoid overflow)
  • m = length of the pattern

This formula treats the substring like a number in base d.

Step 3: Assign Numeric Values to Characters

Let’s assume we map characters as:

A=1, B=2, C=3, D=4, E=5, F=6, G=7

and take d = 10 (for simplicity) and q = 13 (a small prime).

Step 4: Compute Hash of the Pattern “CDD”

H(P) =(C×102+D×101+D×100)mod13

=(3×100+4×10+4) mod  13

=(300+40+4) mod  13= 344 mod  13 = 6

So, H(P) = 6

Step 5: Compute Hash for First Substring “ABC”

H(ABC) =(1×100+2×10+3) mod  13

=(100+20+3) mod  13= 123 mod  13 = 6

→ Hash is 6, same as pattern hash!
But we must verify characters 
“ABC” ≠ “CDD” → so it’s a false match.

Step 5: Use Rolling Hash

Now instead of recalculating from scratch,
we use the rolling hash formula to get the next substring hash quickly:

new_hash = (d×(old_hash−left_char×dm−1)+ new_char ) mod  q

This means:

  • Remove the leftmost character from the old window
  • Multiply the remaining hash by d
  • Add the new character
  • Take modulo q

This way, we slide through the text efficiently.

Step 6: Continue Sliding

Repeat for:

  • “BCC”
  • “CCD”
  • “CDD”

When we reach “CDD”,

H(CDD) = (3×100+4×10+4) mod  13 =6

→ Hash matches H(P)
→ Verify characters → all match → Pattern found at position 3.

Implementation of Rabin-Karp Algorithm using Java

public class RabinKarp {
    // d = number of characters in the input alphabet
    public final static int d = 256;

    public static void search(String pat, String txt, int q) {
        int m = pat.length();
        int n = txt.length();
        int i, j;
        int p = 0; // hash value for pattern
        int t = 0; // hash value for text
        int h = 1;

        // The value of h would be "pow(d, m-1) % q"
        for (i = 0; i < m - 1; i++)
            h = (h * d) % q;

        // Calculate the hash value of pattern and first window of text
        for (i = 0; i < m; i++) {
            p = (d * p + pat.charAt(i)) % q;
            t = (d * t + txt.charAt(i)) % q;
        }

        // Slide the pattern over text one by one
        for (i = 0; i <= n - m; i++) {
            // Check hash values
            if (p == t) {
                // Check characters
                for (j = 0; j < m; j++) {
                    if (txt.charAt(i + j) != pat.charAt(j))
                        break;
                }
                if (j == m)
                    System.out.println("Pattern found at index " + i);
            }

            // Calculate hash value for next window
            if (i < n - m) {
                t = (d * (t - txt.charAt(i) * h) + txt.charAt(i + m)) % q;
                if (t < 0)
                    t = (t + q);
            }
        }
    }

    public static void main(String[] args) {
        String txt = "ABCCDDAEFG";
        String pat = "CDD";
        int q = 101; // A prime number for hashing

        search(pat, txt, q);
    }
}

Output

Pattern found at index 2

Explanation

The given code implements the Rabin-Karp algorithm, which is a string searching technique used to find a pattern inside a larger text efficiently using hashing. It starts by defining a constant d = 256, representing the number of characters in the input alphabet (like ASCII characters). The main logic is in the search method, which takes the pattern (pat), the text (txt), and a prime number (q) used for modular hashing to reduce hash collisions.

First, it calculates a value h, which is equivalent to d^(m-1) % q, where m is the length of the pattern. This h is used in the rolling hash computation to remove the contribution of the first character when moving the window across the text. Next, the initial hash values for the pattern p and the first window of the text t are computed using a loop that multiplies by d and takes modulo q to keep the numbers manageable.

The pattern is then slid over the text one character at a time. For each window, it first compares the hash values p and t. If the hash values match, a character-by-character check is performed to confirm that the pattern actually matches the substring, which prevents false positives caused by hash collisions. When a match is found, it prints the starting index of the pattern in the text.

After checking a window, the hash for the next window is updated using the rolling hash formula: subtract the contribution of the first character, multiply the remaining hash by d, add the new character, and apply modulo q. If the new hash is negative, it is adjusted by adding q to ensure it stays positive.

In the main method, a sample text "ABCCDDAEFG" and pattern "CDD" are provided, and a prime number 101 is used for hashing. Calling search(pat, txt, q) finds that the pattern "CDD" occurs starting at index 2 in the text.

Time Complexity of Rabin-Karp Algorithm

CaseTime Complexity
Best Case(O(n + m))
Average Case(O(n + m))
Worst Case(O(n.m))

Space Complexity of Rabin-Karp Algorithm

CaseSpace Complexity
Best Case(O(1))
Average Case(O(1))
Worst Case(O(1))

Advantages of Rabin-Karp Algorithm

  • Efficient for multiple pattern searches: Rabin-Karp can search for many patterns at once efficiently by comparing hash values instead of checking each pattern individually.
  • Reduces unnecessary comparisons: By using hashing, the algorithm avoids comparing characters unless the hash of the pattern matches the substring, saving time.
  • Works well for large texts and small patterns: It is especially effective when the text is long and the pattern is short, as most comparisons are done on hashes.
  • Average-case time complexity O(n + m): On average, the algorithm takes linear time, where n is the text length and m is the pattern length, making it very fast in practice.
  • Simple and easy to implement: The steps are straightforward, especially using a rolling hash to efficiently update substring hashes.
  • Adaptable for DNA sequences: Rabin-Karp can be applied in bioinformatics to match DNA or protein sequences.
  • Useful in plagiarism detection: It can detect copied text efficiently by comparing hashes of sentences or paragraphs.

Disadvantage of Rabin-Karp Algorithm

  • Worst-case time complexity O(nm): If there are many hash collisions, the algorithm may need to check each character individually, leading to slower performance.
  • Careful prime selection needed: Choosing a suitable prime number q for the hash function is important to minimize collisions.
  • Depends on hash quality: The efficiency of Rabin-Karp relies heavily on having a good hash function that distributes values evenly.
  • Less efficient than some algorithms in worst case: In certain situations, algorithms like Knuth-Morris-Pratt (KMP) may outperform Rabin-Karp, especially when collisions are frequent.
  • No constant-time guarantee: The algorithm cannot always check matches in constant time due to collisions and character comparisons.
  • Susceptible to collisions: Poorly designed or simple hash functions may lead to many false matches, reducing efficiency.
  • Memory overhead: Storing hash values for the pattern and substrings requires extra memory, slightly more than naive character comparison methods.

Conclusion

The Rabin-Karp algorithm is a fast and efficient string matching algorithm that uses hashing to reduce unnecessary comparisons. By comparing hash values first and verifying only on matches, it improves the search process in large texts. It is widely used in text processing, plagiarism detection, and DNA sequence matching. Although hash collisions can increase the time complexity in the worst case, average performance is highly efficient, making Rabin-Karp a powerful tool for string searching problems.