请问为什么这个爬虫没有爬出内容啊


item.py


 python


 -*- coding: utf-8 -*
import scrapy
class BokeItem(scrapy.Item):
    url=scrapy.Field()
    title=scrapy.Field()
    content=scrapy.Field()

boke_spider.py


 python


 -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider ,Rule
from scrapy.contrib.linkextractors import LinkExtractor
from boke.items import BokeItem

class BokeItem(CrawlSpider):
    name = 'blog'
    start_urls =['http://blog.sina.com.cn/s/blog_4701280b0102eo83.html']

    def parse_torrent(self,response):
        torrent=BokeItem()
        torrent['url']=response.url
        torrent['title']=response.xpath("//h2[@class='titName SG_txta']/text()").extract()[0]
        torrent['content']=response.xpath("//div[@style='min-height:22px']/text()").extract()[0]
        return  torrent

python scrapy

膜拜桂雏菊 11 years, 10 months ago

试试看看这个博客,专门针对 scrapy

常盘台的电磁炮 answered 11 years, 10 months ago

试试去看看官方 doc

只是只瓜瓜 answered 11 years, 10 months ago

from scrapy.contrib.spiders import CrawlSpider ,Rule

你调用的是 CrawlSpider 类,但是显然没有写任何的规则

建议改为 Spider 类,并将 parse_torrent 改名为 parse ,如下:


 from scrapy.contrib.spiders import Spider
from boke.items import BokeItem
class BokeItem(Spider):

Hamono answered 11 years, 10 months ago

Your Answer