pinterest dlダウンロード - pinterest dlソースコードのダウンロード

Pinterest 画像ダウンローダー (pinterest-dl)

このライブラリは、Pinterest からの画像のスクレイピングとダウンロードを容易にします。 Selenium を自動化に使用すると、ユーザーは指定された Pinterest URL から画像を抽出し、選択したディレクトリに保存できます。

これには、直接使用するための CLI とプログラムによるアクセスのための Python API が含まれています。このツールは、ブラウザーの Cookie を使用してパブリックおよびプライベートのボードおよびピンから画像をスクレイピングすることをサポートしています。また、ユーザーは将来のアクセスに備えて、スクレイピングした URL を JSON ファイルに保存することもできます。

️免責事項:
このプロジェクトは独立したものであり、Pinterest とは提携していません。教育目的のみを目的として設計されています。 Web サイトのスクレイピングを自動化すると、サービス利用規約に抵触する可能性があることに注意してください。リポジトリ所有者は、このツールの誤用に対する一切の責任を負いません。ご自身の法的リスクを負って、責任を持ってご使用ください。

?️ 注意:
このプロジェクトは、pinterest-image-scraper からインスピレーションを得ています。

?特徴

✅ Pinterest の URL から直接画像をスクレイピングします。
✅ URL のリストから画像を非同期的にダウンロードします。 (プルリクエストを参照)
✅ 今後のアクセスに備えて、スクレイピングした URL を JSON ファイルに保存します。
✅ スクレイピングを目立たないようにするためのシークレットモード。
✅ 効果的なデバッグのための詳細な出力にアクセスします。
✅ Firefox ブラウザのサポート。
✅ 検索しやすくするために、ダウンロードした画像にメタデータcommentとして画像のaltテキストを挿入します。
✅ ブラウザの Cookie を使用してプライベートのボードとピンをスクレイピングします。 (プルリクエストを参照)
✅ リバースエンジニアリングされた Pinterest API を使用して画像をスクレイピングします。 (これはデフォルトの動作になります。 --client chromeまたは--client firefox指定することで webdriver を使用できます) (プルリクエストを参照してください)

既知の問題

?検索クエリを含む Pinterest URL との互換性がない。
? Linux と Mac では厳密にはテストされていません。バグを報告するには問題を作成してください。

?要件

Python 3.10以降
Chrome または Firefox ブラウザ

?インストール

pip の使用 (推奨)

pip install pinterest-dl

GitHub からのクローン作成

git clone https://github.com/sean1832/pinterest-dl.git
cd pinterest-dl
pip install .

CLIの使用法

一般的なコマンド構造

pinterest-dl [command] [options]

例

匿名モードでの画像のスクレイピング:

匿名モードで、ログインせずに、Pinterest URL https://www.pinterest.com/pin/1234567から./images/artディレクトリに画像を30の制限と最小解像度512x512でスクレイピングします。スクレイピングした URL をJSONファイルに保存します。

pinterest-dl scrape " https://www.pinterest.com/pin/1234567 " " images/art " -l 30 -r 512x512 --json

ブラウザの Cookie を取得します。

Pinterest ログイン用のブラウザー Cookie を取得し、ヘッドフルモード (ブラウザーウィンドウを使用) でcookies.jsonファイルに保存します。

pinterest-dl login -o cookies.json --headful

ヒント

Pinterest のメールアドレスとパスワードの入力を求められます。このツールは、将来の使用に備えてブラウザーの Cookie を指定されたファイルに保存します。

プライベートボードのスクレイピング:

cookies.jsonファイルに保存された Cookie を使用して、プライベート Pinterest ボードから画像をスクレイピングします。

pinterest-dl scrape " https://www.pinterest.com/pin/1234567 " " images/art " -l 30 -c cookies.json

ヒント

--clientオプションを使用すると、スクレイピングにchromeまたはfirefox Webdriver を使用できます。これは遅いですが、信頼性は高くなります。ブラウザをヘッドレスモードで開き、画像をスクレイピングします。 --headfulフラグを使用してブラウザをウィンドウモードで実行することもできます。

画像のダウンロード:

最小解像度1024x1024で、 art.jsonファイルから./downloaded_imgsディレクトリに画像をダウンロードします。

pinterest-dl download art.json -o downloaded_imgs -r 1024x1024

コマンド

1. ログイン

資格情報を使用して Pinterest にログインし、プライベートボードやピンをスクレイピングするためのブラウザー Cookie を取得します。

構文：

pinterest-dl login [options]

オプション:

-o , --output [file] : 将来の使用に備えてブラウザーの Cookie を保存するファイル。 (デフォルト: cookies.json )
--client : スクレイピングクライアント ( chrome / firefox ) を選択します。 (デフォルト: chrome )
--headful : ブラウザウィンドウを使用してヘッドフルモードで実行します。
--verbose : デバッグ用の詳細な出力を有効にします。
--incognito : スクレイピング用にシークレットモードを有効にします。

ヒント

loginコマンドを入力すると、Pinterest のメールアドレスとパスワードの入力を求められます。その後、ツールは将来の使用に備えてブラウザーの Cookie を指定されたファイルに保存します。 (指定しない場合は./cookies.jsonに保存されます)

2. こする

指定した Pinterest URL から画像を抽出します。

構文：

pinterest-dl scrape [url] [output_dir] [options]

オプション:

-c , --cookies [file] : プライベートボード/ピンのブラウザ Cookie を含むファイル。 loginコマンドを実行してCookieを取得します。
-l 、 --limit [number] : ダウンロードするイメージの最大数 (デフォルト: 100)。
-r 、 --resolution [width]x[height] : ダウンロードの最小画像解像度 (例: 512x512)。
--timeout [second] : リクエストのタイムアウト (秒単位) (デフォルト: 3)。
--json : スクレイピングされた URL を JSON ファイルに保存します。
--dry-run : イメージをダウンロードせずにスクレイピングを実行します。
--verbose : デバッグ用の詳細な出力を有効にします。
--client : スクレイピングクライアント ( api / chrome / firefox ) を選択します。 (デフォルト: API)
--incognito : スクレイピング用にシークレットモードを有効にします。 (クローム/Firefoxのみ)
--headful : ブラウザウィンドウを使用してヘッドフルモードで実行します。 (クローム/Firefoxのみ)

3. ダウンロード

ファイル内で指定された URL のリストから画像をダウンロードします。

構文：

pinterest-dl download [url_list] [options]

オプション:

-o 、 --output [directory] : 出力ディレクトリ (デフォルト: ./<json_filename>)。
-r 、 --resolution [width]x[height] : ダウンロードする最小解像度 (例: 512x512)。
--verbose : 詳細出力を有効にします。

Python API

PinterestDLクラスを Python コード内で直接使用して、プログラムで画像をスクレイピングしてダウンロードすることもできます。

1. クイックスクレイピングとダウンロード

次の例は、Pinterest URL から画像を 1 ステップでスクレイピングしてダウンロードする方法を示しています。

 from pinterest_dl import PinterestDL

# Initialize and run the Pinterest image downloader with specified settings
images = PinterestDL . with_api (
    timeout = 3 ,  # Timeout in seconds for each request (default: 3)
    verbose = False ,  # Enable detailed logging for debugging (default: False)
). scrape_and_download (
    url = "https://www.pinterest.com/pin/1234567" ,  # Pinterest URL to scrape
    output_dir = "images/art" ,  # Directory to save downloaded images
    limit = 30 ,  # Max number of images to download 
    min_resolution = ( 512 , 512 ),  # Minimum resolution for images (width, height) (default: None)
    json_output = "art.json" ,  # File to save URLs of scraped images (default: None)
    dry_run = False ,  # If True, performs a scrape without downloading images (default: False)
    add_captions = True ,  # Adds image `alt` text as metadata to images (default: False)
)

2.プライベートボード用のCookieを使用したスクレイピング

2a. Cookie を取得するプライベートボードやピンをスクレイピングするためのブラウザ Cookie を取得するには、まず Pinterest にログインする必要があります。

 import os
import json

from pinterest_dl import PinterestDL

# Make sure you don't expose your password in the code.
email = input ( "Enter Pinterest email: " )
password = os . getenv ( "PINTEREST_PASSWORD" )

# Initialize browser and login to Pinterest
cookies = PinterestDL . with_browser (
    browser_type = "chrome" ,
    headless = True ,
). login ( email , password ). get_cookies (
    after_sec = 7 ,  # Time to wait before capturing cookies. Login may take time.
)

# Save cookies to a file
with open ( "cookies.json" , "w" ) as f :
    json . dump ( cookies , f , indent = 4 )

2b. Cookie を使用してスクレイピングCookie を取得したら、それを使用してプライベートボードやピンをスクレイピングできます。

 from pinterest_dl import PinterestDL

# Initialize and run the Pinterest image downloader with specified settings
images = (
    PinterestDL . with_api ()
    . with_cookies (
        "cookies.json" ,  # Path to cookies file
    )
    . scrape_and_download (
        url = "https://www.pinterest.com/pin/1234567" ,  # Assume this is a private board URL
        output_dir = "images/art" ,  # Directory to save downloaded images
        limit = 30 ,  # Max number of images to download
    )
)

3. 下位レベルの制御による詳細なスクレイピング

画像のスクレイピングとダウンロードをより詳細に制御する必要がある場合は、この例を使用してください。

3a. APIあり

 import json

from pinterest_dl import PinterestDL

# 1. Initialize PinterestDL with API.
scraped_images = PinterestDL . with_api (). scrape (
    url = "https://www.pinterest.com/pin/1234567" ,  # URL of the Pinterest page
    limit = 30 ,  # Maximum number of images to scrape
    min_resolution = ( 512 , 512 ),  # <- Only available to set in the API. Browser mode will have to pruned after download.
)

# 2. Save Scraped Data to JSON
# Convert scraped data into a dictionary and save it to a JSON file for future access
images_data = [ img . to_dict () for img in scraped_images ]
with open ( "art.json" , "w" ) as f :
    json . dump ( images_data , f , indent = 4 )

# 3. Download Images
# Download images to a specified directory
downloaded_imgs = PinterestDL . download_images ( images = scraped_images , output_dir = "images/art" )

valid_indices = list ( range ( len ( downloaded_imgs )))  # All images are valid to add captions

# 4. Add Alt Text as Metadata
# Extract `alt` text from images and set it as metadata in the downloaded files
PinterestDL . add_captions ( images = downloaded_imgs , indices = valid_indices )

3b.ブラウザあり

 import json

from pinterest_dl import PinterestDL

# 1. Initialize PinterestDL with API.
scraped_images = PinterestDL . with_browser (
    browser_type = "chrome" ,  # Browser type to use ('chrome' or 'firefox')
    headless = True ,  # Run browser in headless mode
). scrape (
    url = "https://www.pinterest.com/pin/1234567" ,  # URL of the Pinterest page
    limit = 30 ,  # Maximum number of images to scrape
)

# 2. Save Scraped Data to JSON
# Convert scraped data into a dictionary and save it to a JSON file for future access
images_data = [ img . to_dict () for img in scraped_images ]
with open ( "art.json" , "w" ) as f :
    json . dump ( images_data , f , indent = 4 )

# 3. Download Images
# Download images to a specified directory
downloaded_imgs = PinterestDL . download_images ( images = scraped_images , output_dir = "images/art" )

# 4. Prune Images by Resolution
# Remove images that do not meet the minimum resolution criteria
valid_indices = PinterestDL . prune_images ( images = downloaded_imgs , min_resolution = ( 200 , 200 ))

# 5. Add Alt Text as Metadata
# Extract `alt` text from images and set it as metadata in the downloaded files
PinterestDL . add_captions ( images = downloaded_imgs , indices = valid_indices )