Among the string matching search algorithms, the two most famous ones are the KMP algorithm (Knuth-Morris-Pratt) and the BM algorithm (Boyer-Moore). Both algorithms have linear search times in the worst case. However, in practice, the KMP algorithm is not much faster than the simplest C library function strstr(), while the BM algorithm is often 3-5 times faster than the KMP algorithm (not personally practiced). However, the BM algorithm is not the fastest algorithm yet. Here is a search algorithm, the Sunday algorithm, which is faster than the BM algorithm.
The idea of Sunday algorithm is very similar to the idea of bad characters in BM algorithm. The only difference is that after the Sunday algorithm fails to match, it takes the character one position behind the current part of the target string that corresponds to the Pattern string to do bad character matching. When it is found that the match fails, it is judged whether the character at the current offset + Pattern string length + 1 (assumed to be the K position) in the parent string exists in the Pattern string. If it exists, align the position with the character in the Pattern string, and then match from the beginning; if it does not exist, move the Pattern string backward, align it with the character at k+1 of the parent string, and then match. Repeat the above operation until it is found, or the parent string is found. I wrote a small example to implement the following algorithm.
In the code, two string matching algorithms are implemented, one is the Sunday method, and the other is the ordinary method of moving one bit at a time. The efficiency comparison of the two is shown in the main function, both at the nanosecond level. The detailed steps of the algorithm have been added with corresponding comments in the code. Regarding the BM algorithm, we will compare and analyze together next time when it is empty.
Copy the code code as follows:
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
/**
* @author Scott
* @date December 28, 2013
* @description
*/
public class SundySearch {
String text = null;
String pattern = null;
int currentPos = 0;
/**
* List of first character positions of the matched substring
*/
List<Integer> matchedPosList = new LinkedList<Integer>();
/**
* Map of matching characters, records which chars are in the matching string and the last displacement of each char
*/
Map<Character, Integer> map = new HashMap<Character, Integer>();
public SundySearch(String text, String pattern) {
this.text = text;
this.pattern = pattern;
this.initMap();
};
/**
* When matching Sunday, it is used to store the last occurrence position of each character in Pattern, in order from left to right.
*/
private void initMap() {
for (int i = 0; i < pattern.length(); i++) {
this.map.put(pattern.charAt(i), i);
}
}
/**
* Ordinary string recursive matching, if the match fails, advance one position
*/
public List<Integer> normalMatch() {
//Failed to match, continue going down
if (!matchFromSpecialPos(currentPos)) {
currentPos += 1;
if ((text.length() - currentPos) < pattern.length()) {
return matchedPosList;
}
normalMatch();
} else {
//Match successfully, record location
matchedPosList.add(currentPos);
currentPos += 1;
normalMatch();
}
return matchedPosList;
}
/**
* Sunday matching, assuming that the position of the K character in Text is: current offset + Pattern string length + 1
*/
public List<Integer> sundayMatch() {
// If no match is successful
if (!matchFromSpecialPos(currentPos)) {
// If the K character in Text does not appear in the Pattern string, skip the entire Pattern string length.
if ((currentPos + pattern.length() + 1) < text.length()
&& !map.containsKey(text.charAt(currentPos + pattern.length() + 1))) {
currentPos += pattern.length();
}else {
// If the K character in Text appears in the Pattern string, align the position of the K character in Text with the position of the last K character in the Pattern string.
if ((currentPos + pattern.length() + 1) > text.length()) {
currentPos += 1;
} else {
currentPos += pattern.length() - (Integer) map.get(text.charAt(currentPos + pattern.length()));
}
}
// When the match is completed, return the initial displacement of all successful matches.
if ((text.length() - currentPos) < pattern.length()) {
return matchedPosList;
}
sundayMatch();
}else {
//If the match is successful, advance one bit and then match again
matchedPosList.add(currentPos);
currentPos += 1;
sundayMatch();
}
return matchedPosList;
}
/**
* Check whether the substring starting from the specified offset of Text matches Pattern
*/
public boolean matchFromSpecialPos(int pos) {
if ((text.length()-pos) < pattern.length()) {
return false;
}
for (int i = 0; i < pattern.length(); i++) {
if (text.charAt(pos + i) == pattern.charAt(i)) {
if (i == (pattern.length()-1)) {
return true;
}
continue;
} else {
break;
}
}
return false;
}
public static void main(String[] args) {
SundySearch sundySearch = new SundySearch("hello ah Adolf adfsadfklf adf234masdfsdfdsfdsfdsffwerwrewrerwerwersdf2666sdflsdfk", "adf");
long begin = System.nanoTime();
System.out.println("NormalMatch:" + sundySearch.normalMatch());
System.out.println("NormalMatch:" + (System.nanoTime() - begin));
begin = System.nanoTime();
System.out.println("SundayMatch:" + sundySearch.sundayMatch());
System.out.println("SundayMatch:" + (System.nanoTime() - begin));
}
}