pyspider下载 - pyspider源码下载

pyspider

Python

v0.3.10

下载

蜘蛛

一个强大的Python蜘蛛（网络爬虫）系统。

用Python编写脚本
强大的 WebUI，带有脚本编辑器、任务监视器、项目管理器和结果查看器
MySQL、MongoDB、Redis、SQLite、Elasticsearch；使用 SQLAlchemy 作为数据库后端的 PostgreSQL
RabbitMQ、Redis 和 Kombu 作为消息队列
任务优先级、重试、定期、按年龄重新爬行等...
分布式架构、抓取 Javascript 页面、Python 2.{6,7}、3.{3,4,5,6} 支持等...

教程：http://docs.pyspider.org/en/latest/tutorial/
文档：http://docs.pyspider.org/
发行说明：https://github.com/binux/pyspider/releases

示例代码

 from pyspider . libs . base_handler import *


class Handler ( BaseHandler ):
    crawl_config = {
    }

    @ every ( minutes = 24 * 60 )
    def on_start ( self ):
        self . crawl ( 'http://scrapy.org/' , callback = self . index_page )

    @ config ( age = 10 * 24 * 60 * 60 )
    def index_page ( self , response ):
        for each in response . doc ( 'a[href^="http"]' ). items ():
            self . crawl ( each . attr . href , callback = self . detail_page )

    def detail_page ( self , response ):
        return {
            "url" : response . url ,
            "title" : response . doc ( 'title' ). text (),
        }