Descargar pyspider - Descarga del código fuente pyspider

pyspider

Pitón

v0.3.10

Descargar

araña

Un potente sistema Spider (rastreador web) en Python.

Escribir script en Python
Potente WebUI con editor de scripts, monitor de tareas, administrador de proyectos y visor de resultados.
MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL con SQLAlchemy como backend de base de datos
RabbitMQ, Redis y Kombu como cola de mensajes
Prioridad de tarea, reintento, periódico, rastreo por edad, etc...
Arquitectura distribuida, rastreo de páginas Javascript, soporte para Python 2.{6,7}, 3.{3,4,5,6}, etc...

Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentación: http://docs.pyspider.org/
Notas de la versión: https://github.com/binux/pyspider/releases

Código de muestra

 from pyspider . libs . base_handler import *


class Handler ( BaseHandler ):
    crawl_config = {
    }

    @ every ( minutes = 24 * 60 )
    def on_start ( self ):
        self . crawl ( 'http://scrapy.org/' , callback = self . index_page )

    @ config ( age = 10 * 24 * 60 * 60 )
    def index_page ( self , response ):
        for each in response . doc ( 'a[href^="http"]' ). items ():
            self . crawl ( each . attr . href , callback = self . detail_page )

    def detail_page ( self , response ):
        return {
            "url" : response . url ,
            "title" : response . doc ( 'title' ). text (),
        }

Instalación

pip install pyspider
ejecute el comando pyspider , visite http://localhost:5000/

ADVERTENCIA: WebUI está abierta al público de forma predeterminada y puede usarse para ejecutar cualquier comando que pueda dañar su sistema. Úselo en una red interna o habilite need-auth para webui.

Inicio rápido: http://docs.pyspider.org/en/latest/Quickstart/