This article describes the Java programming method to extract keywords from the article. Share it for your reference, as follows:
Implementation code:
/** * Related jar packages* lucene-core-3.6.2.jar,lucene-memory-3.6.2.jar, * lucene-highlighter-3.6.2.jar,lucene-analyzers-3.6.2.jar * IKAnalyzer2012.jar * * Intercept keywords that appear frequently in an article, and give them grouping (flashback), returning n keywords in array format * * This class contains a List2Map method, which can convert duplicate <String> collections. In the format of Map<String, Integer>* and calculate the number of repetitions of the <String> and put it in the corresponding value*/package com.lifeix.api.util; import java.io.IOException; import java.io.StringRea der; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.wltea.analyzer.lucene.IKAnalyzer; /** * 获取文章关键字* @author anwj * */ public class WordUtil { /** Test article*/ static String keyWord = "The comedian Pan Changjiang has become a "male matchmaker", but this time it is not a sketch - the urban comedy "male matchmaker" written, directed and acted by him will On January 13, it was premiered on Beijing TV. In the play, Pan Changjiang transformed into a "new era male matchmaker" Ding Erchun, who is versatile and warm in ancient ways, and staged a show with Zhang Ting, "Taiwan's No. 1 Dilm Beauty 2" The romantic love story of "Silk Counterattack" "+". Stars such as Li Mingqi, Li Wenqi, Feng Yuanzheng, Ren Chengwei, Ma Li, Xu and others also joined forces to create "joking materials" "Male Matchmaker" revolves around Ding Erchun and him "+ "办的“全成热恋”婚介所展开。人到中年的丁二春眼看来势汹涌的“婚恋大潮”商机不断,想凭借一张巧嘴开创事业和人生" + "“第二春”。 The marriage agency opened well, and the customers were crowded, but the requirements were all kinds of strange. The money-worshiping women, otaku and little boss made a fan appearance, launching a series of exciting stories that are hilarious and hilarious.剧中的一大看点是美女搭配“丑男”的搭配,张庭与潘长江成了一对欢喜冤家。 Zhang Ting said that the two in the play "+" have a gap in height, a gap in age, and an unequal appearance." When Pan Changjiang talked about this protagonist setting, he believed that "Zhang Ting's previous roles were very independent and cute, and 'big woman' and '" + "little man' are the settings of our couple, so Zhang Ting is a very suitable candidate. "In addition, the drama is also the fourth time Pan Changjiang directed and acted by himself after "The Great Man Feng Tiangui" and "The Clear Water and Blue Sky" "+" the first and second parts Comedy work. Pan Changjiang said that the whole play shows the various marriage and love values in contemporary society through the perspective of the special profession of "matchmaker", covering many hotly discussed topics of the times, such as "+" dusk love, gold-worshiping women, and phoenix men. (Reporter Yin Chunfang) Disclaimer: This article only represents the author’s personal views and has nothing to do with Global Network. Its originality and the words and content of the statement "+" in the article have not been verified by this website. This website does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of the content and text. Readers are requested to refer to it only , " + " and please verify the relevant content yourself. "; /** Get the number of keywords*/ private final static Integer NUM=5; /** Intercept the number of keywords above several words*/ private final static Integer QUANTITY=1; /** * Pass in String Types of articles, intelligently extract words into list* @param article * @param a * @return * @throws IOException */ private static List<String> extract(String article,Integer a) th rows IOException { List<String> list =new ArrayList<String>(); //Define a list to receive the word IKAnalyzer analyzer = new IKAnalyzer(); //Initialize IKAnalyzer analyzer.setUseSmart(true); //Replace IKAnaly zer is set to intelligently intercept TokenStream tokenStream= //Call the tokenStream method (read the character stream of the article) analyzer.tokenStream("", new StringReader(article)); while (tokenStream.incrementToken()) { //Loop to obtain the intercepted word CharTermAttribute c harTermAttribute = //Convert is char type tokenStream.getAttribute(CharTermAttribute.class); String keWord= charTermAttribute.toString(); //Convert to String type if (keWord.length()>a) { // Determine the intercepted keywords with more than a few words Quantity (default is more than 2 words) list.add(keWord); //Put the final obtained word into the list set} } return list; } /** * Convert the set in the list into the key in the map, The default value is 1 * @param list * @return */ private static Map<String, Integer> list2Map(List<String> list){ Map<String, Integer> map=new HashMap<Strin g, Integer>(); for(String key:list){ //Loop-obtained List set if (list.contains(key)) { //Judge whether the string map.put(key, map.get(key) == null ? 1 : map.get(key)+1); } //Place the string obtained in the set on the key key of the map} // and calculate whether its value has a value. If so, the +1 operation will return map; } /** * Method of extracting keywords* @param article * @param a * @param n * @return * @throws IOException */ public static String[] getKeyWords(String article,Integ er a,Integer n) throws IOException { List <String> keyWordsList= extract(article,a); //Call the extract word method Map<String, Integer> map=list2Map(keyWordsList); //Turn the map and count the number of times//Use the comparison method of Collections to match the ma In p Order of value ArrayList<Entry<String, Integer>> list = new ArrayList<Entry<String,Integer>>(map.entrySet()); Collections.sort(list, new Comparator<Map.En try<String, Integer>> () { public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) { return (o2.getValue() - o1.getValue()); } }); if (list .size()<n) n=list.size(); //Sorted length to avoid obtaining null characters String[] keyWords=new String[n]; //Set the keyword array space for the to be output (int i=0; i< list.size(); i++) { //Array after loop sorting if (i<n) { //Judge the number of keyWords[i]=list.get(i).getKey( ); //Set keywords into array} } return keyWords; } /** * * @param article * @return * @throws IOException */ public static String[] getKeyWords(String art icle) throws IOException{ return getKeyWords(article, QUANTITY,NUM); } public static void main(String[] args) { try { String [] keywords = getKeyWords(keyWord); for(int i=0; i<keywords.length; i ++){ System.out.println (keywords[i]); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
I hope this article will be helpful to everyone's Java programming.