scrapingbee python下載 - scrapingbee python源代碼下載

scrapingbee python

其他源碼

v2.0.1:

下載

Crapingbee Python SDK

CrapingBee是一種網絡刮擦API，可處理無頭瀏覽器並為您旋轉代理。 Python SDK使與CrapingBee的API互動變得更加容易。

安裝

您可以使用PIP安裝Crapingbee Python SDK。

pip install scrapingbee

用法

Crapingbee Python SDK是圍繞請求庫的包裝器。抓犬支持獲取和發布請求。

註冊以獲取您的API密鑰和一些免費的積分以開始使用。

提出請求

 >> > from scrapingbee import ScrapingBeeClient

>> > client = ScrapingBeeClient ( api_key = 'REPLACE-WITH-YOUR-API-KEY' )

>> > response = client . get (
    'https://www.scrapingbee.com/blog/' , 
    params = {
        # Block ads on the page you want to scrape	
        'block_ads' : False ,
        # Block images and CSS on the page you want to scrape	
        'block_resources' : True ,
        # Premium proxy geolocation
        'country_code' : '' ,
        # Control the device the request will be sent from	
        'device' : 'desktop' ,
        # Use some data extraction rules
        'extract_rules' : { 'title' : 'h1' },
        # Wrap response in JSON
        'json_response' : False ,
        # Interact with the webpage you want to scrape 
        'js_scenario' : {
            "instructions" : [
                { "wait_for" : "#slow_button" },
                { "click" : "#slow_button" },
                { "scroll_x" : 1000 },
                { "wait" : 1000 },
                { "scroll_x" : 1000 },
                { "wait" : 1000 },            
            ]
        },
        # Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)
        'premium_proxy' : False ,
        # Execute JavaScript code with a Headless Browser (5 credits/request)
        'render_js' : True ,
        # Return the original HTML before the JavaScript rendering	
        'return_page_source' : False ,
        # Return page screenshot as a png image
        'screenshot' : False ,
        # Take a full page screenshot without the window limitation
        'screenshot_full_page' : False ,
        # Transparently return the same HTTP code of the page requested.
        'transparent_status_code' : False ,
        # Wait, in miliseconds, before returning the response
        'wait' : 0 ,
        # Wait for CSS selector before returning the response, ex ".title"
        'wait_for' : '' ,
        # Set the browser window width in pixel
        'window_width' : 1920 ,
        # Set the browser window height in pixel
        'window_height' : 1080
    },
    headers = {
        # Forward custom headers to the target website
        "key" : "value"
    },
    cookies = {
        # Forward custom cookies to the target website
        "name" : "value"
    }
)
>> > response . text
'<!DOCTYPE html><html lang="en"><head>...'

CrapingBee需要各種參數來渲染JavaScript，執行自定義JavaScript腳本，使用特定地理位置的高級代理等。

您可以在CrapingBee文檔中找到所有受支持的參數。

您可以像通常使用請求庫一樣發送自定義的cookie和標頭。

螢幕截圖

在這裡，如何在其移動分辨率中從CrapingBee博客中檢索和存儲屏幕截圖有所了解。

 >> > from scrapingbee import ScrapingBeeClient

>> > client = ScrapingBeeClient ( api_key = 'REPLACE-WITH-YOUR-API-KEY' )

>> > response = client . get (
    'https://www.scrapingbee.com/blog/' , 
    params = {
        # Take a screenshot
        'screenshot' : True ,
        # Specify that we need the full height
        'screenshot_full_page' : True ,
        # Specify a mobile width in pixel
        'window_width' : 375
    }
)

>> > if response . ok :
        with open ( "./scrapingbee_mobile.png" , "wb" ) as f :
            f . write ( response . content )

將刮擦和砂紙與

砂紙是最受歡迎的Python Web刮擦框架。您可以輕鬆地將Crapingbee的API與零食中間件集成在一起。

重試

客戶包括用於5xx響應的重試機制。

 >> > from scrapingbee import ScrapingBeeClient

>> > client = ScrapingBeeClient ( api_key = 'REPLACE-WITH-YOUR-API-KEY' )

>> > response = client . get (
    'https://www.scrapingbee.com/blog/' , 
    params = {
        'render_js' : True ,
    },
    retries = 5
)

展開

附加信息

版本 v2.0.1:
類型其他源碼
更新時間 2025-02-15
大小 11.38KB
來自於 Github

相關應用

Python Portfolio

2024-11-10
datamule python

2024-11-08
stripe python

2024-11-05
automaited python

2024-11-03
Python原始碼python管理系統python原始碼python案例python系統

2023-01-11
Python

2009-05-24

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
Sunamu

其他源碼

Release 2.2.0
MySchedule.py

其他源碼

Updates to the fetching of week codes
waymo open dataset

其他源碼

December 2023 Update
termwind

其他類別

v2.3.0
wp functions

其他類別

1.0.0

相關資訊全部