QueryList collection tool download-QueryList collection tool v4.1.0 source code download

QueryList collection tool v4.1.0

Other categories

4.1.0

Download

QueryList is a simple and elegant PHP collection tool based on phpQuery, with high scalability.

characteristic

1. Have the exact same CSS3 DOM selector as jQuery

2. Has exactly the same DOM operation API as jQuery

3. Have a universal list collection solution

4. With a powerful HTTP request suite, it can easily implement complex network requests such as simulated login, fake browser, HTTP proxy, etc.

5. Have a garbled code solution

6. It has powerful content filtering function and can use jQuey selector to filter content.

7. It has a high degree of modular design and strong scalability.

8. Have an expressive API

9. Have high-quality documentation

10. Has rich plug-ins

11. Have a professional Q&A community and communication group

Plug-ins make it easy to implement things like

1. Multi-thread collection

2. Collect JavaScript dynamically rendered pages (PhantomJS/headless WebKit)

3. Image localization

4. Simulate browser behavior, such as submitting a Form form

5. Web crawler

6.......

Environmental requirements

PHP >= 7.0

If your PHP is still stuck at PHP5, or you don't know how to use Composer, you can choose to use QueryList3. QueryList3 supports php5.3 and manual installation.

Install

Install via Composer:

composer require jaeger/querylist

use

Element operations

Collect all image addresses on "Nitu.com"

QueryList::get('http://www.nipic.com')->find('img')->attrs('src');

Collect Baidu search results

$ql = QueryList::get('http://www.baidu.com/s?wd=QueryList');$ql->find('title')->text(); // Get the website title $ql ->find('meta[name=keywords]')->content; // Get the website header keyword $ql->find('h3>a')->texts(); // Get the search result title list $ql->find('h3>a')->attrs('href'); //Get the search result link list $ql->find('img')->src; //Get the first picture Link address $ql->find('img:eq(1)')->src; //Get the link address of the second picture $ql->find('img')->eq(2)->src ; //Get the link address of the third picture // Traverse all pictures $ql->find('img')->map(function($img){echo $img->alt; //Print the alt attribute of the picture });

More usage

$ql->find('#head')->append('<div>Append content</div>')->find('div')->htmls();$ql->find('.two ')->children('img')->attrs('alt'); //Get all img child nodes under the element with class two //Traverse all child nodes under the element with class two $data = $ql- >find('.two')->children()->map(function ($item){ //Use is to determine node type if($item->is('a')){ return $item->text (); }elseif($item->is('img')) { return $item->alt; }});$ql->find('a')->attr('href', 'newVal' )->removeClass('className')->html('newHtml')->...$ql->find('div > p')->add('div > ul')->filter(': has(a)')->find('p:first')->nextAll()->andSelf()->...$ql->find('div.old')->replaceWith( $ql- >find('div.new')->clone())->appendTo('.trash')->prepend('Deleted')->...

List collection

Collect the titles and links of Baidu search results list:

$data = QueryList::get('http://www.baidu.com/s?wd=QueryList')// Set collection rules->rules([ 'title'=>array('h3','text' ), 'link'=>array('h3>a','href')])->query()->getData();print_r($data->all());

Collection results:

Array( [0] => Array ( [title] => QueryList | An extremely powerful PHP collection tool based on phpQuery [link] => http://www.baidu.com/link?url=GU_YbDT2IHk4ns1tjG2I8_vjmH0SCJEAPuuZN ) [1] = > Array ( [title] => PHP uses QueryList to crawl web page content - wb145230 - Blog Park [link] => http://www.baidu.com/link?url=zn0DXBnrvIF2ibRVW34KcRVFG1_bCdZvqvwIhUqiXaS ) [2] => Array ( [title ] => Introduction - QueryList guidance document [link] => http://www.baidu.com/link?url=pSypvMovqS4v2sWeQo5fDBJ4EoYhXYi0Lxx ) //...)

transcoding

// Output encoding: UTF-8, input encoding: GB2312QueryList::get('https://top.etao.com')->encoding('UTF-8','GB2312')->find('a' )->texts();//Output encoding: UTF-8, input encoding: automatic recognition QueryList::get('https://top.etao.com')->encoding('UTF-8')-> find('a')->texts();

HTTP network operations (GuzzleHttp)

Log in to Sina Weibo with cookies

//Collect pages from Sina Weibo that require login to access $ql = QueryList::get('http://weibo.com','param1=testvalue & params2=somevalue',[ 'headers' => [ //Fill in Cookie 'Cookie' obtained from the browser => 'SINAGLOBAL=546064; wb_cmtLike_2112031=1; wvr=6;....' ]]);//echo $ql->getHtml();echo $ql-> find('title')->text();//Output: My homepage Weibo - discover new things anytime and anywhere

Use HTTP proxy

$urlParams = ['param1' => 'testvalue','params2' => 'somevalue'];$opts = [// Set http proxy 'proxy' => 'http://222.141.11.17:8118', / /Set the timeout, unit: seconds 'timeout' => 30, // Forge http headers 'headers' => [ 'Referer' => 'https://querylist.cc/', 'User-Agent' => ' testing/1.0', 'Accept' => 'application/json', 'X-Foo' => ['Bar', 'Baz'], 'Cookie' => 'abc=111;xxx=222' ]]; $ql->get('http://httpbin.org/get',$urlParams,$opts);// echo $ql->getHtml();

Simulated login

// Use post to log in $ql = QueryList::post('http://xxxx.com/login',[ 'username' => 'admin', 'password' => '123456'])->get(' http://xxx.com/admin');//Collect pages that require login to access $ql->get('http://xxx.com/admin/page');//echo $ql->getHtml ();

Form form operations

Simulate login to GitHub

// Get the QueryList instance $ql = QueryList::getInstance(); // Get the login form $form = $ql->get('https://github.com/login')->find('form') ;//Fill in the GitHub username and password $form->find('input[name=login]')->val('your github username or email');$form->find('input[name=password]' )->val('your github password');//Serialized form data $fromData = $form->serializeArray();$postData = [];foreach ($fromData as $item) { $postData[$item[ 'name']] = $item['value'];}//Submit login form $actionUrl = 'https://github.com'.$form->attr('action');$ql->post( $actionUrl,$postData);//Determine whether the login is successful// echo $ql->getHtml();$userName = $ql->find('.header-nav-current-user>.css-truncate-target' )->text();if($userName){ echo 'Login successful! Welcome:'.$userName;}else{ echo 'Login failed!';}

Bind function extension

Customize and extend a myHttp method:

$ql = QueryList::getInstance();//Bind a myHttp method to the QueryList object $ql->bind('myHttp',function ($url){ // $this is the current QueryList object $html = file_get_contents( $url); $this->setHtml($html); return $this;});//Then you can call $data = $ql->myHttp('https://toutiao.io' )->find('h3 a')->texts();print_r($data->all());

Or encapsulate the implementation into a class and bind it like this:

$ql->bind('myHttp',function ($url){ return new MyHttp($this,$url);});

Plug-in usage

Use the PhantomJS plug-in to collect JavaScript dynamically rendered pages:

//Set the PhantomJS binary file path during installation $ql = QueryList::use(PhantomJs::class,'/usr/local/bin/phantomjs');//Collect the Toutiao mobile game $data = $ql->browser( 'https://m.toutiao.com')->find('p')->texts();print_r($data->all());// Use HTTP proxy $ql->browser('https ://m.toutiao.com',false,['--proxy' => '192.168.1.42:8080', '--proxy-type' => 'http'])

Use CURL multi-thread plug-in to collect GitHub rankings in multi-threads:

$ql = QueryList::use(CurlMulti::class);$ql->curlMulti([ 'https://github.com/trending/php', 'https://github.com/trending/go', / /.....more urls]) // Call this callback when each task is successfully completed ->success(function (QueryList $ql,CurlMulti $curl,$r){ echo "Current url:{$r['info' ]['url']} rn"; $data = $ql->find('h3 a')->texts(); print_r($data->all());}) // each Task failure callback->error(function ($errorInfo,CurlMulti $curl){ echo "Current url:{$errorInfo['info']['url']} rn"; print_r($errorInfo['error' ]);})->start([// Maximum number of concurrency 'maxThread' => 10, // Number of error retries 'maxTry' => 3,]);