cURL is a tool that uses URL syntax to transfer files and data. It supports many protocols, such as HTTP, FTP, TELNET, etc. The best thing is that php also supports the cURL library. This article will introduce some advanced features of cURL and how to use it in PHP.
Why use cURL?
Yes, we can obtain web content through other methods. Most of the time, because I want to be lazy, I just use a simple php function:
$content = file_get_contents(" http://www.bizhicool.com ");
// or
$lines = file(" http://www.bizhicool.com ");
// or
readfile( http://www.bizhicool.com ); However, this approach lacks flexibility and effective error handling. Moreover, you cannot use it to complete some difficult tasks - such as handling cookies, validation, form submission, file upload, etc.
cURL is a powerful library that supports many different protocols, options, and can provide various details related to URL requests.
Basic Structure Before learning about more complex functions, let's take a look at the basic steps of setting up a cURL request in PHP:
Initialize, set variables, execute and obtain results, release cURL handle
// 1. Initialization
$ch = curl_init();
// 2. Set options, including URL
curl_setopt($ch, CURLOPT_URL, " http://www.bizhicool.com ");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
// 3. Execute and obtain the content of the HTML document
$output = curl_exec($ch);
// 4. Release curl handle
curl_close($ch); The second step (that is, curl_setopt()) is the most important, and all the mystery lies here. There is a long list of cURL parameters that can be set that specify various details of the URL request. It can be difficult to read and understand them all at once, so today we will only try the more common and useful options.
Checking for errors You can add a statement to check for errors (although this is not required):
// ...
$output = curl_exec($ch);
if ($output === FALSE) {
echo "cURL Error: " . curl_error($ch);
}
// ...Please note that we use "=== FALSE" instead of "== FALSE" when comparing. Because we have to distinguish between empty output and the Boolean value FALSE, which is the real error.
Get information This is another optional setting that can get information about this request after cURL is executed:
// ...
curl_exec($ch);
$info = curl_getinfo($ch);
echo 'Get'. $info['url'] . 'Time consuming'. $info['total_time'] . 'Seconds';
// ...the returned array includes the following information:
"url" //Resource network address
"content_type" //Content encoding
"http_code" //HTTP status code
"header_size" //The size of the header
"request_size" //Request size
"filetime" //File creation time
"ssl_verify_result" //SSL verification result
"redirect_count" //Jump technology
"total_time" //Total time taken
"namelookup_time" //DNS query time
"connect_time" //Waiting for connection time
"pretransfer_time" //The preparation time before transmission
"size_upload" //The size of the uploaded data
"size_download" //The size of the downloaded data
"speed_download" //Download speed
"speed_upload" //Upload speed
"download_content_length" //The length of the download content
"upload_content_length" //The length of the uploaded content
"starttransfer_time" //Time to start transfer
"redirect_time" //Redirect takes time Browser-based redirection In the first example, we will provide a piece of code that detects whether the server has a browser-based redirection. For example, some websites will redirect web pages based on whether it is a mobile browser or even what country the user is from.
We use the CURLOPT_HTTPHEADER option to set the HTTP request headers we send out (http headers), including user agent information and default language. Then we'll see if these specific websites redirect us to different URLs.
// URL for testing
$urls = array(
" http://www.cnn.com ",
" http://www.mozilla.com ",
" http://www.facebook.com "
);
// Browser information for testing
$browsers = array(
"standard" => array (
"user_agent" => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)",
"language" => "en-us,en;q=0.5"
),
"iphone" => array (
"user_agent" => "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A537a Safari/419.3",
"language" => "en"
),
"french" => array (
"user_agent" => "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 2.0.50727)",
"language" => "fr,fr-FR;q=0.5"
)
);
foreach ($urls as $url) {
echo "URL: $urln";
foreach ($browsers as $test_name => $browser) {
$ch = curl_init();
//Set url
curl_setopt($ch, CURLOPT_URL, $url);
//Set browser-specific headers
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
"User-Agent: {$browser['user_agent']}",
"Accept-Language: {$browser['language']}"
));
// We don’t need the page content
curl_setopt($ch, CURLOPT_NOBODY, 1);
// Just return the HTTP header
curl_setopt($ch, CURLOPT_HEADER, 1);
// Return the result instead of printing it
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
// Is there any HTTP header information for redirection?
if (preg_match("!Location: (.*)!", $output, $matches)) {
echo "$test_name: redirects to $matches[1]n";
} else {
echo "$test_name: no redirectionn";
}
}
echo "nn";
}First, we create a set of URLs that need to be tested, and then specify a set of browser information that need to be tested. Finally, a loop is used to test various URL and browser matching situations that may occur.
Because we specified the cURL option, the returned output only includes HTTP header information (stored in $output). Using a simple regular rule, we check whether the header information contains the word "Location:".
Running this code should return the following results:
Sending data using the POST method When making a GET request, data can be passed to a URL via a "query string". For example, when searching in Google, the search key is part of the query string of the URL:
http://www.google.com/search?q=nettuts In this case you probably don't need cURL to simulate. Throwing this URL to "file_get_contents()" will get the same result.
However, some HTML forms are submitted using the POST method. When this form is submitted, the data is sent through the HTTP request body (request body) instead of the query string. For example, when using the CodeIgniter forum form, no matter what keywords you enter, you will always be POSTed to the following page:
http://codeigniter.com/forums/do_search/ You can use a PHP script to simulate this URL request. First, create a new file that can accept and display POST data. We name it post_output.php:
print_r($_POST); Next, write a PHP script to perform the cURL request:
$url = " http://localhost/post_output.php ";
$post_data = array (
"foo" => "bar",
"query" => "Nettuts",
"action" => "Submit"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// We are POSTing data!
curl_setopt($ch, CURLOPT_POST, 1);
//Add the post variable
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
$output = curl_exec($ch);
curl_close($ch);
echo $output;After executing the code, you should get the following results:
This script sends a POST request to post_output.php, the page's $_POST variable and returns it. We capture this output using cURL.
File Upload Uploading files is very similar to the previous POST. Because all file upload forms are submitted through the POST method.
First, create a new page to receive files, named upload_output.php:
print_r($_FILES);The following is the script that actually performs the file upload task:
$url = " http://localhost/upload_output.php ";
$post_data = array (
"foo" => "bar",
//The local file address to be uploaded
"upload" => "@C:/wamp/www/test.zip"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
$output = curl_exec($ch);
curl_close($ch);
echo $output; If you need to upload a file, just pass the file path like a post variable, but remember to add the @ symbol in front. Executing this script should result in the following output:
cURL batch processing (multi cURL)
cURL also has an advanced feature - batch handles. This feature allows you to open multiple URL connections simultaneously or asynchronously.
Here is sample code from php.net:
//Create two cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
//Specify URL and appropriate parameters
curl_setopt($ch1, CURLOPT_URL, " http://lxr.php.net/ ");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, " http://www.php.net/ ");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//Create cURL batch handle
$mh = curl_multi_init();
//Add the first two resource handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
// Predefine a state variable
$active = null;
//Execute batch processing
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//Close each handle
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh); What you need to do here is to open multiple cURL handles and assign them to a batch handle. Then you just wait in a while loop for it to complete.
There are two main loops in this example. The first do-while loop calls curl_multi_exec() repeatedly. This function is non-blocking but will be executed as little as possible. It returns a status value. As long as this value is equal to the constant CURLM_CALL_MULTI_PERFORM, it means that there is still some urgent work to be done (for example, sending the http header information corresponding to the URL). That is, we need to keep calling this function until the return value changes.
The following while loop will only continue when the $active variable is true. This variable was previously passed to curl_multi_exec() as the second parameter and represents whether there are still active connections in the batch handle. Next, we call curl_multi_select(), which is "blocked" until an active connection (such as receiving a server response) occurs. After this function executes successfully, we will enter another do-while loop and continue to the next URL.
Let’s take a look at how to put this feature to practical use:
WordPress Connection Checker Imagine you have a blog with a large number of articles that contain a large number of links to external websites. After a while, a good number of these links became invalid for one reason or another. Either it has been harmonized, or the entire site has been hacked...
Let's create a script below to analyze all these links, find out the websites/webpages that cannot be opened or have 404, and generate a report.
Please note that the following is not a real working WordPress plug-in, it is just a script with independent functions, for demonstration only, thank you.
OK, let's get started. First, read all these links from the database:
//CONFIG
$db_host = 'localhost';
$db_user = 'root';
$db_pass = '';
$db_name = 'wordpress';
$excluded_domains = array(
'localhost', 'www.mydomain.com');
$max_connections = 10;
//Initialize some variables
$url_list = array();
$working_urls = array();
$dead_urls = array();
$not_found_urls = array();
$active = null;
// Connect to MySQL
if (!mysql_connect($db_host, $db_user, $db_pass)) {
die('Could not connect: ' . mysql_error());
}
if (!mysql_select_db($db_name)) {
die('Could not select db: ' . mysql_error());
}
// Find all articles with links
$q = "SELECT post_content FROM wp_posts
WHERE post_content LIKE '%href=%'
AND post_status = 'publish'
AND post_type = 'post'";
$r = mysql_query($q) or die(mysql_error());
while ($d = mysql_fetch_assoc($r)) {
// Use regular matching links
if (preg_match_all("!href="(.*?)"!", $d['post_content'], $matches)) {
foreach ($matches[1] as $url) {
// exclude some domains
$tmp = parse_url($url);
if (in_array($tmp['host'], $excluded_domains)) {
continue;
}
// store the url
$url_list []= $url;
}
}
}
// Remove duplicate links
$url_list = array_values(array_unique($url_list));
if (!$url_list) {
die('No URL to check');
}We first configure the database, a series of domain names to be excluded ($excluded_domains), and the maximum number of concurrent connections ($max_connections). Then, connect to the database, get the articles and included links, and collect them into an array ($url_list).
The code below is a bit complicated, so I will explain it in detail step by step:
// 1. Batch processor
$mh = curl_multi_init();
// 2. Add URLs that need to be processed in batches
for ($i = 0; $i < $max_connections; $i++) {
add_url_to_multi_handle($mh, $url_list);
}
// 3. Initial processing
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 4. Main loop
while ($active && $mrc == CURLM_OK) {
// 5. There is an active connection
if (curl_multi_select($mh) != -1) {
// 6. Work
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 7. Do you have any information?
if ($mhinfo = curl_multi_info_read($mh)) {
//Means that the connection ended normally
// 8. Get information from curl handle
$chinfo = curl_getinfo($mhinfo['handle']);
// 9. Dead link?
if (!$chinfo['http_code']) {
$dead_urls []= $chinfo['url'];
// 10. 404?
} else if ($chinfo['http_code'] == 404) {
$not_found_urls []= $chinfo['url'];
// 11. Still available
} else {
$working_urls []= $chinfo['url'];
}
// 12. Remove handle
curl_multi_remove_handle($mh, $mhinfo['handle']);
curl_close($mhinfo['handle']);
// 13. Add new URL and do the work
if (add_url_to_multi_handle($mh, $url_list)) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
}
}
// 14. Finished
curl_multi_close($mh);
echo "==Dead URLs==n";
echo implode("n",$dead_urls) . "nn";
echo "==404 URLs==n";
echo implode("n",$not_found_urls) . "nn";
echo "==Working URLs==n";
echo implode("n",$working_urls);
// 15. Add url to batch processor
function add_url_to_multi_handle($mh, $url_list) {
static $index = 0;
// If the url is left, it is useless
if ($url_list[$index]) {
//Create a new curl handle
$ch = curl_init();
//Configure url
curl_setopt($ch, CURLOPT_URL, $url_list[$index]);
// Don't want to output the returned content
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// We will go wherever the redirection takes us
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// No content body is required, which can save bandwidth and time
curl_setopt($ch, CURLOPT_NOBODY, 1);
//Add to batch processor
curl_multi_add_handle($mh, $ch);
// Dial the counter and you can add the next url next time you call this function.
$index++;
return true;
} else {
// No new URLs need to be processed
return false;
}
}The above code is explained below. The serial numbers in the list correspond to the sequential numbers in the code comments.
Create a new batch processor. Created a multi handle.
Later we will create a function add_url_to_multi_handle() that adds URLs to the batch handler. Whenever this function is called, a new URL is added to the batch processor. Initially, we add 10 URLs to the batch processor (this number is determined by $max_connections).
It is necessary to run curl_multi_exec() to do the initialization work, as long as it returns CURLM_CALL_MULTI_PERFORM there is still something to do. This is done primarily to create the connection, it does not wait for the full URL response.
The main loop continues as long as there are active connections in the batch.
curl_multi_select() waits until a URL query results in an active connection.
The work of cURL is here again, mainly to obtain response data.
Check various information. When a URL request is completed, an array is returned.
There is a cURL handle in the returned array. We use it to obtain the corresponding information for a single cURL request.
If this is a dead link or the request times out, no http status code will be returned.
If this page cannot be found, a 404 status code will be returned.
In other cases, we assume that this link is available (of course, you can also check for 500 errors and the like...).
Remove this cURL handle from the batch because it is no longer of use, close it!
Great, now you can add another URL. Once again, the initialization work begins again...
Well, everything that needs to be done is done. Close the batch processor and generate the report.
Let’s go back to the function that adds a new URL to the batch processor. Every time this function is called, the static variable $index is incremented so that we can know how many URLs are left to be processed.
I ran this script on my blog (need for testing, some wrong links were added intentionally), and the results are as follows:
A total of about 40 URLs were checked and it took less than two seconds. When you need to check a larger number of URLs, the worry-saving effect can be imagined! If you open 10 connections at the same time, it will be 10 times faster! In addition, you can also take advantage of cURL batch processing's non-blocking feature to handle a large number of URL requests without blocking your web scripts.
Some other useful cURL options
HTTP Authentication If a URL request requires HTTP-based authentication, you can use the following code:
$url = " http://www.somesite.com/members/ ";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//Send username and password
curl_setopt($ch, CURLOPT_USERPWD, "myusername:mypassword");
// You can allow it to redirect
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// The following options allow cURL to
// Can also send username and password
curl_setopt($ch, CURLOPT_UNRESTRICTED_AUTH, 1);
$output = curl_exec($ch);
curl_close($ch);FTP upload
PHP comes with its own FTP library, but you can also use cURL:
//Open a file pointer
$file = fopen("/path/to/file", "r");
// The url contains most of the required information
$url = " ftp://username:[email protected]:21/path/to/new/file ";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//Upload related options
curl_setopt($ch, CURLOPT_UPLOAD, 1);
curl_setopt($ch, CURLOPT_INFILE, $fp);
curl_setopt($ch, CURLOPT_INFILESIZE, filesize("/path/to/file"));
// Whether to enable ASCII mode (useful when uploading text files)
curl_setopt($ch, CURLOPT_FTPASCII, 1);
$output = curl_exec($ch);
curl_close($ch); To circumvent the wall, you can use a proxy to initiate a cURL request:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://www.example.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//Specify proxy address
curl_setopt($ch, CURLOPT_PROXY, '11.11.11.11:8080');
// Provide username and password if required
curl_setopt($ch, CURLOPT_PROXYUSERPWD,'user:pass');
$output = curl_exec($ch);
curl_close ($ch); The callback function allows cURL to call a specified callback function during a URL request. For example, start utilizing data as soon as the content or response is downloading, rather than waiting until it is completely downloaded.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://net.tutsplus.com');
curl_setopt($ch, CURLOPT_WRITEFUNCTION,"progress_function");
curl_exec($ch);
curl_close ($ch);
function progress_function($ch,$str) {
echo $str;
return strlen($str);
}This callback function must return the length of the string, otherwise this function will not work properly.
During the URL response reception process, this function will be called as long as a data packet is received.
Summary Today we learned about the powerful functions and flexible scalability of the cURL library. Hope you like it. The next time you want to make a URL request, consider cURL!
Thanks!
Original text: Quick start with cURL based on PHP
Original English text: http://net.tutsplus.com/tutorials/php/techniques-and-resources-for-mastering-curl/
Original author: Burak Guzel
The source must be retained for reprinting.