The example of this article tells the method of using Java to use reptiles to capture the website webpage content. Share it for everyone for your reference. The specifics are as follows:
Recently, I used Java to study climbing technology. Hehe, I entered a door, and shared my experience with you the following two methods. One is the package provided by Apache. The other is self -owned by Java.
The code is as follows:
// The first method // This method is to use the package provided by Apache, simple and convenient //, but the following packages are used: Commons-codec -.4.jar // Commons-httpClient-3.1.jar // Commons-logging -1.0.4.JarPublic Static String CreatehttpClient (String URL, String Param) {httpClient Client = New HttpClient (); String Response = NULL; String d = null; PostMethod PostMethod = New PostMethod (url); // Try {// If (Param! = NULL) // Keyword = New String (Param.getbytes ("GB2312"), "ISO-8859-1"); //} Catch (UNSUPPORTEDEDENGEXCEPTION E1) {// // TODO Auto -Generatd Catch block // e1.printstacktrace (); //} // nameValuePair [] data = {new nameValuePair ("keyword", keyword)}; // // Put the value of the form in postmethod // HOD.SETREQUESTBODY (DATA ); // The above part is a parameter grasp, I canceled it myself. You can cancel the logout and study. 8859-1 ")," GB2312 "); // Here Pay attention to gb2312 as you need to grab the coding of the webpage as you need to grab the webpage. >*> "," "); // Remove the label System.out.println (p) with html language in the webpage;} Catch (Exception E) {e.printstacktrace ();} Return Response;} // The second method // This method is Java's own URL to capture the website content Public String GetPageContent (String Strurl, String StrpostRequest, Int MaxLength) {// ingbuffer (); System. Setproperty ("Sun.net.client.DefaultConnecttimeout", "5000"); System.setproperty ("Sun.net.client.DefaultReadtimeout", "5000"); try {urll newURL = New URL (Strurl); HttpurlConnection HCONNECT = (HttpurlConnection) NewUrl .opeenconnection (); // POST Method's additional data if (strpostRequest.length ()> 0) {hConnect.Setdooutput (true); outputStreamwriter out new outputStreamWriter (hConnect .GetoutPutstream ()); out.write (StrpostRequest); out.flush (); out.close ();} // Read the content bufferedReader RD = new bufferedReader h; for (int length = 0; ( ch = rd.read ())> -1 && (maxlength <= 0 || Length <maxlength); specific ++) buffer.append ((char) ch); string s = buffer.tostring (); s.Replaceall (" // & [a-za-z] {1,10}; "," "," ") .replaceall (" <[^>]*> "," "); System.out.println (s); rd.close (); hconnect.disconnect (); Return buffer.tostring (). Trim ();} Catch (Exception E) {// Return "error: Read the webpage failed! "; // Return null;}}
Then write a test class:
Public static void main (string [] args) {string url = "//www.vevb.com"; string keyword = "Wulin.com"; CreatehttpClient p = New CreatehtpClient (); string response = p.createhttpClient (url, keyword ) ;/ The first method // p.getPageContent (url, "post", 100500); // The second method}
Haha, look at the console, do you get the content of the webpage
It is hoped that this article is helpful to everyone's Java program design.