#KMP

Recently I counted many problem to handle the certain string in a document or a large sequence. All the idea is to use brut force way to find the match string. I was told that there is great way to handle large input called KMP.

I do some search it is O(mn) in the worst case but it runs at average O(m + n). When the input source length m is large enough we can say that the algorithm is O(n).

The idea is there are two phase the first phase is to do some preprocessing over the pattern pat[] and constructs an auxiliary array lps[] of size m (same as size of pattern). Here name lps indicates longest proper prefix which is also suffix

Examples: For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5] For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]

This can be a little hard to understand you can watch a video

Pase 2 is Searching : Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the next sliding position. Let us see how we do that. When we compare pat[j] with txt[i] and see a mismatch, we know that characters pat[0..j-1] match with txt[i-j+1…i-1], and we also know that lps[j-1] characters of pat[0…j-1] are both proper prefix and suffix which means we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. See KMPSearch() in the below code for details.

Talk is cheap:

public int strStr(String source, String target) {
        
        int slen = source.length();
        int tlen = target.length();
        int j = 0;
        int[] lps = new int[tlen];
        char[] pat = target.toCharArray();
        computeLPSArray(pat, tlen, lps);
        int i = 0; // index of source
        while (i < slen) {
            if (pat[j] == source.charAt(i)) {
                i++;
                j++;
            }
            if (j == tlen) {
                return i - j;
            } else if (i < slen && pat[j] != source.charAt(i)) {
                if (j != 0)
                j = lps[j - 1];
                else 
                i = i + 1;
            }
        }
        return -1;
    }
    public void computeLPSArray(char[] pat, int tlen, int[] lps) {
        int len = 0; // length of previous longest prefix suffix
        int i;
        
        lps[0] = 0; // always 0
        i = 1;
        while (i < tlen) {
            if (pat[i] == pat[len]) {
                len++;
                lps[i] = len;
                i++;
                
            } else {
                if (len != 0) {
                    len = lps[len - 1];
                } else {
                    lps[i] = 0;
                    i++;
                }
            }
        }
    }