请问为什么这个爬虫没有爬出内容啊


item.py


 python


 -*- coding: utf-8 -*
import scrapy
class BokeItem(scrapy.Item):
    url=scrapy.Field()
    title=scrapy.Field()
    content=scrapy.Field()

boke_spider.py


 python


 -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider ,Rule
from scrapy.contrib.linkextractors import LinkExtractor
from boke.items import BokeItem

class BokeItem(CrawlSpider):
    name = 'blog'
    start_urls =['http://blog.sina.com.cn/s/blog_4701280b0102eo83.html']

    def parse_torrent(self,response):
        torrent=BokeItem()
        torrent['url']=response.url
        torrent['title']=response.xpath("//h2[@class='titName SG_txta']/text()").extract()[0]
        torrent['content']=response.xpath("//div[@style='min-height:22px']/text()").extract()[0]
        return  torrent

python scrapy

膜拜桂雏菊 11 years, 8 months ago

试试看看这个博客,专门针对 scrapy

常盘台的电磁炮 answered 11 years, 8 months ago

试试去看看官方 doc

只是只瓜瓜 answered 11 years, 8 months ago

from scrapy.contrib.spiders import CrawlSpider ,Rule

你调用的是 CrawlSpider 类,但是显然没有写任何的规则

建议改为 Spider 类,并将 parse_torrent 改名为 parse ,如下:


 from scrapy.contrib.spiders import Spider
from boke.items import BokeItem
class BokeItem(Spider):

Hamono answered 11 years, 8 months ago

Your Answer