Python library to scrape data, and especially media links like videos and photos, from vk.com URLs.
You can use it via the command line or as a python library, check the documentation.
You can install the most recent release from pypi via pip install vk-url-scraper
.
Currently you need to manually unsintall and re-install one dependency (as it is installed from github and not pypi):
pip uninstall vk-api
pip install git+https://github.com/python273/vk_api.git@b99dac0ec2f832a6c4b20bde49869e7229ce4742
To use the library you will need a valid username/password combination for vk.com.
# run this to learn more about the parameters
vk_url_scraper --help
# scrape a URL and get the JSON result in the console
vk_url_scraper --username "username here" --password "password here" --urls https://vk.com/wall12345_6789
# OR
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789
# you can also have multiple urls
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789 https://vk.com/photo-12345_6789 https://vk.com/video12345_6789
# you can pass a token as well to avoid always authenticating
# and possibly getting captcha prompts
# you can fetch the token from the vk_config.v2.json file generated under by searching for "access_token"
vk_url_scraper -u "username" -p "password" -t "vktoken goes here" --urls https://vk.com/wall12345_6789
# save the JSON output into a file
vk_url_scraper -u "username here" -p "password here" --urls https://vk.com/wall12345_6789 > output.json
# download any photos or videos found in these URLS
# this will use or create an output/ folder and dump the files there
vk_url_scraper -u "username here" -p "password here" --download --urls https://vk.com/wall12345_6789
# or
vk_url_scraper -u "username here" -p "password here" -d --urls https://vk.com/wall12345_6789
from vk_url_scraper import VkScraper
vks = VkScraper("username", "password")
# scrape any "photo" URL
res = vks.scrape("https://vk.com/photo1_278184324?rev=1")
# scrape any "wall" URL
res = vks.scrape("https://vk.com/wall-1_398461")
# scrape any "video" URL
res = vks.scrape("https://vk.com/video-6596301_145810025")
print(res[0]["text"]) # eg: -> to get the text from code
# Every scrape* function returns a list of dict like
{
"id": "wall_id",
"text": "text in this post" ,
"datetime": utc datetime of post,
"attachments": {
# if photo, video, link exists
"photo": [list of urls with max quality],
"video": [list of urls with max quality],
"link": [list of urls with max quality],
},
"payload": "original JSON response converted to dict which you can parse for more data
}
see [docs] for all available functions.
(more info in CONTRIBUTING.md).
pip install -r dev-requirements.txt
or pipenv install -r dev-requirements.txt
pip install -r requirements.txt
or pipenv install -r requirements.txt
make run-checks
(fixes style) or individually
black .
and isort .
-> flake8 .
to validate lintmypy .
pytest .
(pytest -v --color=yes --doctest-modules tests/ vk_url_scraper/
to use verbose, colors, and test docstring examples)make docs
to generate shpynx docs -> edit config.py if neededTo test the command line interface available in main.py you need to pass the -m
option to python like so: python -m vk_url_scraper -u "" -p "" --urls ...
pipenv run pip freeze > requirements.txt
if you manage libs with pipenv
vk-api==11.9.9
../scripts/release.sh
to create a tag and push, alternatively
git tag vx.y.z
to tag versiongit push origin vx.y.z
-> this will trigger workflow and put project on pypiIf for some reason the GitHub Actions release workflow failed with an error that needs to be fixed, you'll have to delete both the tag and corresponding release from GitHub. After you've pushed a fix, delete the tag from your local clone with
git tag -l | xargs git tag -d && git fetch -t
Then repeat the steps above.