pinterest dl下载 - pinterest dl源代码下载

Pinterest 图像下载器 (pinterest-dl)

该库有助于从 Pinterest 抓取和下载图像。使用 Selenium 进行自动化，它使用户能够从指定的 Pinterest URL 中提取图像并将其保存到选定的目录中。

它包括用于直接使用的 CLI 和用于编程访问的 Python API。该工具支持使用浏览器 cookie 从公共和私人板和图钉中抓取图像。它还允许用户将抓取的 URL 保存到 JSON 文件中以供将来访问。

️免责声明：
该项目是独立的，不隶属于 Pinterest。它专为教育目的而设计。请注意，自动抓取网站可能会与其服务条款发生冲突。存储库所有者对滥用此工具不承担任何责任。负责任地使用它并自行承担法律风险。

？️注意：
该项目的灵感来自 pinterest-image-scraper。

？特征

✅ 直接从 Pinterest URL 抓取图像。
✅ 从 URL 列表异步下载图像。（参见拉取请求）
✅ 将抓取的 URL 保存到 JSON 文件以供将来访问。
✅ 隐身模式可让您的刮擦保持谨慎。
✅ 访问详细的输出以进行有效的调试。
✅ 支持火狐浏览器。
✅ 在下载的图像中插入图像的alt文本作为元数据comment ，以便于搜索。
✅ 使用浏览器 cookie 刮擦私人板和图钉。（参见拉取请求）
✅ 使用反向工程的 Pinterest API 抓取图像。（这将是默认行为。您可以通过指定--client chrome或--client firefox来使用 webdriver ）（请参阅拉取请求

已知问题

？与包含搜索查询的 Pinterest URL 不兼容。
？没有在 Linux 和 Mac 上进行过严格测试。请创建一个问题来报告任何错误。

？要求

Python 3.10 或更高版本
Chrome 或 Firefox 浏览器

？安装

使用 pip（推荐）

pip install pinterest-dl

从 GitHub 克隆

git clone https://github.com/sean1832/pinterest-dl.git
cd pinterest-dl
pip install .

CLI 用法

一般命令结构

pinterest-dl [command] [options]

示例

以匿名模式抓取图像：

以匿名模式，无需登录，从 Pinterest URL https://www.pinterest.com/pin/1234567抓取图像到./images/art目录，图像限制为30张，最小分辨率为512x512 。将抓取的 URL 保存到JSON文件。

pinterest-dl scrape " https://www.pinterest.com/pin/1234567 " " images/art " -l 30 -r 512x512 --json

获取浏览器 Cookie：

获取用于 Pinterest 登录的浏览器 cookie，并以 headful 模式（使用浏览器窗口）将它们保存到cookies.json文件中。

pinterest-dl login -o cookies.json --headful

提示

系统将提示您输入 Pinterest 电子邮件和密码。该工具会将浏览器cookie保存到指定的文件中以供将来使用。

刮掉私人董事会：

使用cookies.json文件中保存的 cookies 从私人 Pinterest 板上抓取图像。

pinterest-dl scrape " https://www.pinterest.com/pin/1234567 " " images/art " -l 30 -c cookies.json

提示

您可以使用--client选项来使用chrome或firefox Webdriver 进行抓取。这较慢但更可靠。它将以无头模式打开浏览器来抓取图像。您还可以使用--headful标志以窗口模式运行浏览器。

下载图像：

将图像从art.json文件下载到./downloaded_imgs目录，最小分辨率为1024x1024 。

pinterest-dl download art.json -o downloaded_imgs -r 1024x1024

命令

1. 登录

使用您的凭据登录 Pinterest 以获取浏览器 cookie，以抓取私人图板和图钉。

句法：

pinterest-dl login [options]

选项：

-o , --output [file] ：保存浏览器 cookie 供将来使用的文件。（默认： cookies.json ）
--client ：选择抓取客户端（ chrome / firefox ）。（默认： chrome ）
--headful ：在浏览器窗口的 headful 模式下运行。
--verbose ：启用详细输出以进行调试。
--incognito ：激活隐身模式进行抓取。

提示

输入login命令后，系统将提示您输入 Pinterest 电子邮件和密码。然后，该工具会将浏览器 cookie 保存到指定文件中以供将来使用。（如果不指定，则会保存到./cookies.json ）

2.刮擦

从指定的 Pinterest URL 提取图像。

句法：

pinterest-dl scrape [url] [output_dir] [options]

选项：

-c , --cookies [file] ：包含专用板/引脚的浏览器 cookie 的文件。运行login命令获取cookie。
-l , --limit [number] ：要下载的最大图像数量（默认值：100）。
-r , --resolution [width]x[height] ：下载的最小图像分辨率（例如，512x512）。
--timeout [second] ：请求超时（以秒为单位）（默认值：3）。
--json ：将抓取的 URL 保存到 JSON 文件中。
--dry-run ：执行抓取而不下载图像。
--verbose ：启用详细输出以进行调试。
--client : 选择抓取客户端 ( api / chrome / firefox )。（默认：API）
--incognito ：激活隐身模式进行抓取。（仅限铬/火狐浏览器）
--headful ：在浏览器窗口的 headful 模式下运行。（仅限铬/火狐浏览器）

3. 下载

从文件中提供的 URL 列表下载图像。

句法：

pinterest-dl download [url_list] [options]

选项：

-o , --output [directory] ：输出目录（默认值：./<json_filename>）。
-r , --resolution [width]x[height] ：下载的最小分辨率（例如 512x512）。
--verbose ：启用详细输出。

Python API

您还可以直接在 Python 代码中使用PinterestDL类以编程方式抓取和下载图像。

1. 快速抓取和下载

以下示例展示了如何一步从 Pinterest URL 抓取和下载图像。

 from pinterest_dl import PinterestDL

# Initialize and run the Pinterest image downloader with specified settings
images = PinterestDL . with_api (
    timeout = 3 ,  # Timeout in seconds for each request (default: 3)
    verbose = False ,  # Enable detailed logging for debugging (default: False)
). scrape_and_download (
    url = "https://www.pinterest.com/pin/1234567" ,  # Pinterest URL to scrape
    output_dir = "images/art" ,  # Directory to save downloaded images
    limit = 30 ,  # Max number of images to download 
    min_resolution = ( 512 , 512 ),  # Minimum resolution for images (width, height) (default: None)
    json_output = "art.json" ,  # File to save URLs of scraped images (default: None)
    dry_run = False ,  # If True, performs a scrape without downloading images (default: False)
    add_captions = True ,  # Adds image `alt` text as metadata to images (default: False)
)

2. 用 Cookie 刮擦私人板

2a.获取 cookie您需要首先登录 Pinterest 获取浏览器 cookie，以便抓取私人图板和图钉。

 import os
import json

from pinterest_dl import PinterestDL

# Make sure you don't expose your password in the code.
email = input ( "Enter Pinterest email: " )
password = os . getenv ( "PINTEREST_PASSWORD" )

# Initialize browser and login to Pinterest
cookies = PinterestDL . with_browser (
    browser_type = "chrome" ,
    headless = True ,
). login ( email , password ). get_cookies (
    after_sec = 7 ,  # Time to wait before capturing cookies. Login may take time.
)

# Save cookies to a file
with open ( "cookies.json" , "w" ) as f :
    json . dump ( cookies , f , indent = 4 )

2b.使用cookies刮取cookies后，您可以使用它们来刮取私人板卡和引脚。

 from pinterest_dl import PinterestDL

# Initialize and run the Pinterest image downloader with specified settings
images = (
    PinterestDL . with_api ()
    . with_cookies (
        "cookies.json" ,  # Path to cookies file
    )
    . scrape_and_download (
        url = "https://www.pinterest.com/pin/1234567" ,  # Assume this is a private board URL
        output_dir = "images/art" ,  # Directory to save downloaded images
        limit = 30 ,  # Max number of images to download
    )
)

3. 低级控制的详细抓取

如果您需要对抓取和下载图像进行更精细的控制，请使用此示例。

3a.具有API

 import json

from pinterest_dl import PinterestDL

# 1. Initialize PinterestDL with API.
scraped_images = PinterestDL . with_api (). scrape (
    url = "https://www.pinterest.com/pin/1234567" ,  # URL of the Pinterest page
    limit = 30 ,  # Maximum number of images to scrape
    min_resolution = ( 512 , 512 ),  # <- Only available to set in the API. Browser mode will have to pruned after download.
)

# 2. Save Scraped Data to JSON
# Convert scraped data into a dictionary and save it to a JSON file for future access
images_data = [ img . to_dict () for img in scraped_images ]
with open ( "art.json" , "w" ) as f :
    json . dump ( images_data , f , indent = 4 )

# 3. Download Images
# Download images to a specified directory
downloaded_imgs = PinterestDL . download_images ( images = scraped_images , output_dir = "images/art" )

valid_indices = list ( range ( len ( downloaded_imgs )))  # All images are valid to add captions

# 4. Add Alt Text as Metadata
# Extract `alt` text from images and set it as metadata in the downloaded files
PinterestDL . add_captions ( images = downloaded_imgs , indices = valid_indices )

3b.使用浏览器

 import json

from pinterest_dl import PinterestDL

# 1. Initialize PinterestDL with API.
scraped_images = PinterestDL . with_browser (
    browser_type = "chrome" ,  # Browser type to use ('chrome' or 'firefox')
    headless = True ,  # Run browser in headless mode
). scrape (
    url = "https://www.pinterest.com/pin/1234567" ,  # URL of the Pinterest page
    limit = 30 ,  # Maximum number of images to scrape
)

# 2. Save Scraped Data to JSON
# Convert scraped data into a dictionary and save it to a JSON file for future access
images_data = [ img . to_dict () for img in scraped_images ]
with open ( "art.json" , "w" ) as f :
    json . dump ( images_data , f , indent = 4 )

# 3. Download Images
# Download images to a specified directory
downloaded_imgs = PinterestDL . download_images ( images = scraped_images , output_dir = "images/art" )

# 4. Prune Images by Resolution
# Remove images that do not meet the minimum resolution criteria
valid_indices = PinterestDL . prune_images ( images = downloaded_imgs , min_resolution = ( 200 , 200 ))

# 5. Add Alt Text as Metadata
# Extract `alt` text from images and set it as metadata in the downloaded files
PinterestDL . add_captions ( images = downloaded_imgs , indices = valid_indices )