使用 pip 安装kagglehub
包:
pip install kagglehub
仅当访问需要用户同意的公共资源或私有资源时才需要进行身份验证。
首先,您需要一个 Kaggle 帐户。您可以在这里注册。
登录后,您可以通过单击“API”部分下的“创建新令牌”按钮,在 https://www.kaggle.com/settings 下载您的 Kaggle API 凭证。
您有 3 种不同的身份验证选项。
这将提示您输入用户名和令牌:
import kagglehub
kagglehub . login ()
您还可以选择将 Kaggle 用户名和令牌导出到环境中:
export KAGGLE_USERNAME=datadinosaur
export KAGGLE_KEY=xxxxxxxxxxxxxx
kaggle.json
读取凭据将您的kaggle.json
凭证文件存储在~/.kaggle/kaggle.json
。
或者,您可以设置KAGGLE_CONFIG_DIR
环境变量以将此位置更改为$KAGGLE_CONFIG_DIR/kaggle.json
。
Windows 用户注意:默认目录是%HOMEPATH%/kaggle.json
。
将您的用户名和密钥令牌存储为 Colab 秘密KAGGLE_USERNAME
和KAGGLE_KEY
。
有关在 Colab 和 Colab Enterprise 中添加机密的说明,请参阅本文。
以下示例下载此 Kaggle 模型的answer-equivalence-bem
变体:https://www.kaggle.com/models/google/bert/tensorFlow2/answer-equivalence-bem
import kagglehub
# Download the latest version.
kagglehub . model_download ( 'google/bert/tensorFlow2/answer-equivalence-bem' )
# Download a specific version.
kagglehub . model_download ( 'google/bert/tensorFlow2/answer-equivalence-bem/1' )
# Download a single file.
kagglehub . model_download ( 'google/bert/tensorFlow2/answer-equivalence-bem' , path = 'variables/variables.index' )
# Download a model or file, even if previously downloaded to cache.
kagglehub . model_download ( 'google/bert/tensorFlow2/answer-equivalence-bem' , force_download = True )
上传新变体(如果已存在,则上传新变体的版本)。
import kagglehub
# For example, to upload a new variation to this model:
# - https://www.kaggle.com/models/google/bert/tensorFlow2/answer-equivalence-bem
#
# You would use the following handle: `google/bert/tensorFlow2/answer-equivalence-bem`
handle = '///'
local_model_dir = 'path/to/local/model/dir'
kagglehub . model_upload ( handle , local_model_dir )
# You can also specify some version notes (optional)
kagglehub . model_upload ( handle , local_model_dir , version_notes = 'improved accuracy' )
# You can also specify a license (optional)
kagglehub . model_upload ( handle , local_model_dir , license_name = 'Apache 2.0' )
# You can also specify a list of patterns for files/dirs to ignore.
# These patterns are combined with `kagglehub.models.DEFAULT_IGNORE_PATTERNS`
# to determine which files and directories to exclude.
# To ignore entire directories, include a trailing slash (/) in the pattern.
kagglehub . model_upload ( handle , local_model_dir , ignore_patterns = [ "original/" , "*.tmp" ])
以下示例下载Spotify Recommendation
Kaggle 数据集:https://www.kaggle.com/datasets/bricevergnou/spotify-recommendation
import kagglehub
# Download the latest version.
kagglehub . dataset_download ( 'bricevergnou/spotify-recommendation' )
# Download a specific version.
kagglehub . dataset_download ( 'bricevergnou/spotify-recommendation/versions/1' )
# Download a single file
kagglehub . dataset_download ( 'bricevergnou/spotify-recommendation' , path = 'data.csv' )
# Download a dataset or file, even if previously downloaded to cache.
kagglehub . dataset_download ( 'bricevergnou/spotify-recommendation' , force_download = True )
上传新数据集(如果已存在,则上传新版本)。
import kagglehub
# For example, to upload a new dataset (or version) at:
# - https://www.kaggle.com/datasets/bricevergnou/spotify-recommendation
#
# You would use the following handle: `bricevergnou/spotify-recommendation`
handle = ' < KAGGLE_USERNAME > / < DATASET >
local_dataset_dir = 'path/to/local/dataset/dir'
# Create a new dataset
kagglehub . dataset_upload ( handle , local_dataset_dir )
# You can then create a new version of this existing dataset and include version notes (optional).
kagglehub . dataset_upload ( handle , local_dataset_dir , version_notes = 'improved data' )
# You can also specify a list of patterns for files/dirs to ignore.
# These patterns are combined with `kagglehub.datasets.DEFAULT_IGNORE_PATTERNS`
# to determine which files and directories to exclude.
# To ignore entire directories, include a trailing slash (/) in the pattern.
kagglehub . dataset_upload ( handle , local_dataset_dir , ignore_patterns = [ "original/" , "*.tmp" ])
以下示例下载Digit Recognizer
Kaggle 竞赛:https://www.kaggle.com/competitions/digit-recognizer
import kagglehub
# Download the latest version.
kagglehub . competition_download ( 'digit-recognizer' )
# Download a single file
kagglehub . competition_download ( 'digit-recognizer' , path = 'train.csv' )
# Download a competition or file, even if previously downloaded to cache.
kagglehub . competition_download ( 'digit-recognizer' , force_download = True )
我们使用孵化来管理这个项目。
请按照以下说明进行安装。
# Run all tests for current Python version.
hatch test
# Run all tests for all Python versions.
hatch test --all
# Run all tests for a specific Python version.
hatch test -py 3.11
# Run a single test file
hatch test tests/test_ < SOME_FILE > .py
要在本地计算机上运行集成测试,您需要设置 Kaggle API 凭据。您可以通过本文档前面部分中描述的两种方法之一来执行此操作。请参阅以下部分:
通过任何这些方法设置凭据后,您可以运行集成测试,如下所示:
# Run all tests
hatch test integration_tests
kagglehub
# Download a model & print the path
hatch run python -c " import kagglehub; print('path: ', kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem')) "
# Lint check
hatch run lint:style
hatch run lint:typing
hatch run lint:all # for both
# Format
hatch run lint:fmt
hatch test --cover
hatch build
hatch
命令这对于在一致的环境中运行并在 Python 版本之间轻松切换非常有用。
下面显示了如何运行hatch run lint:all
但这也适用于任何其他孵化命令:
# Use default Python version
./docker-hatch run lint:all
# Use specific Python version (Must be a valid tag from: https://hub.docker.com/_/python)
./docker-hatch -v 3.9 run lint:all
# Run test in docker with specific Python version
./docker-hatch -v 3.9 test
安装推荐的扩展。
配置孵化以在项目文件夹中创建虚拟环境。
hatch config set dirs.env.virtual .env
之后,通过运行hatch -e all run tests
来创建所需的所有 python 环境。
最后,配置 vscode 使用选定的环境之一: cmd + shift + p
-> python: Select Interpreter
-> 选择./.env
中的文件夹之一
kagglehub 库已配置自动日志记录,该日志记录存储在日志文件夹中。日志目标通过 os.path.expanduser 解析
下表包含可能的位置:
操作系统 | 日志路径 |
---|---|
操作系统 | /user/$USERNAME/.kaggle/logs/kagglehub.log |
操作系统 | 〜/.kaggle/logs/kagglehub.log |
视窗 | C:Users%USERNAME%.kagglelogskagglehub.log |
请包含日志以帮助解决问题。