scrapingbee pythonのダウンロードscrapingbee pythonソースコードのダウンロード

scrapingbee python

その他のソースコード

v2.0.1:

ダウンロード

ScrapingBee Python SDK

ScrapingBeeは、ヘッドレスブラウザーを処理し、プロキシを回転させるWeb Scraping APIです。 Python SDKにより、ScrapingBeeのAPIとの対話が簡単になります。

インストール

ScrapingBee Python SDKをPIPでインストールできます。

pip install scrapingbee

使用法

ScrapingBee Python SDKは、リクエストライブラリをめぐるラッパーです。 ScrapingBeeは、GETリクエストと投稿をサポートします。

ScrapingBeeにサインアップして、APIキーと開始するための無料のクレジットを取得します。

GETリクエストを行う

 >> > from scrapingbee import ScrapingBeeClient

>> > client = ScrapingBeeClient ( api_key = 'REPLACE-WITH-YOUR-API-KEY' )

>> > response = client . get (
    'https://www.scrapingbee.com/blog/' , 
    params = {
        # Block ads on the page you want to scrape	
        'block_ads' : False ,
        # Block images and CSS on the page you want to scrape	
        'block_resources' : True ,
        # Premium proxy geolocation
        'country_code' : '' ,
        # Control the device the request will be sent from	
        'device' : 'desktop' ,
        # Use some data extraction rules
        'extract_rules' : { 'title' : 'h1' },
        # Wrap response in JSON
        'json_response' : False ,
        # Interact with the webpage you want to scrape 
        'js_scenario' : {
            "instructions" : [
                { "wait_for" : "#slow_button" },
                { "click" : "#slow_button" },
                { "scroll_x" : 1000 },
                { "wait" : 1000 },
                { "scroll_x" : 1000 },
                { "wait" : 1000 },            
            ]
        },
        # Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)
        'premium_proxy' : False ,
        # Execute JavaScript code with a Headless Browser (5 credits/request)
        'render_js' : True ,
        # Return the original HTML before the JavaScript rendering	
        'return_page_source' : False ,
        # Return page screenshot as a png image
        'screenshot' : False ,
        # Take a full page screenshot without the window limitation
        'screenshot_full_page' : False ,
        # Transparently return the same HTTP code of the page requested.
        'transparent_status_code' : False ,
        # Wait, in miliseconds, before returning the response
        'wait' : 0 ,
        # Wait for CSS selector before returning the response, ex ".title"
        'wait_for' : '' ,
        # Set the browser window width in pixel
        'window_width' : 1920 ,
        # Set the browser window height in pixel
        'window_height' : 1080
    },
    headers = {
        # Forward custom headers to the target website
        "key" : "value"
    },
    cookies = {
        # Forward custom cookies to the target website
        "name" : "value"
    }
)
>> > response . text
'<!DOCTYPE html><html lang="en"><head>...'

ScrapingBeeは、JavaScriptをレンダリングし、カスタムJavaScriptスクリプトを実行し、特定のジオロケーションからプレミアムプロキシなどを使用するためにさまざまなパラメーターを取ります。

ScrapingBeeのドキュメントでサポートされているすべてのパラメーターを見つけることができます。

通常、リクエストライブラリを使用するようなカスタムクッキーとヘッダーを送信できます。

スクリーンショット

ここでは、ScrapingBeeブログからモバイル解像度でスクリーンショットを取得して保存する方法についての少し例を示しています。

 >> > from scrapingbee import ScrapingBeeClient

>> > client = ScrapingBeeClient ( api_key = 'REPLACE-WITH-YOUR-API-KEY' )

>> > response = client . get (
    'https://www.scrapingbee.com/blog/' , 
    params = {
        # Take a screenshot
        'screenshot' : True ,
        # Specify that we need the full height
        'screenshot_full_page' : True ,
        # Specify a mobile width in pixel
        'window_width' : 375
    }
)

>> > if response . ok :
        with open ( "./scrapingbee_mobile.png" , "wb" ) as f :
            f . write ( response . content )

スクラップビーを使用してスクラピーを使用します

Scrapyは、最も人気のあるPython Webスクレイピングフレームワークです。 ScrapingBeeのAPIをScrapyミドルウェアと簡単に統合できます。

再試行

クライアントには、5xx応答の再試行メカニズムが含まれています。

 >> > from scrapingbee import ScrapingBeeClient

>> > client = ScrapingBeeClient ( api_key = 'REPLACE-WITH-YOUR-API-KEY' )

>> > response = client . get (
    'https://www.scrapingbee.com/blog/' , 
    params = {
        'render_js' : True ,
    },
    retries = 5
)