scrapy 抓取网站从网站产品分类获取了所有分类信息,怎么通过分类的id向二级页面获取分类下的产品详细呢
比如执行:
scrapy crawl xxxx
启动 xxxx spider 获取到了产品分类列表
[
{
"classname" : "奥迪",
"classpic" : "htttp://xxxxx/xx.jpg",
"classid" : 1001,
"producturl" : "http://xxxxx/product/1001"
}
]
如何通过
producturl
或者
classid
自动执行获取产品详细的 spider 抓取产品详内容。
[
{
"productid" : 9000,
"productname" : "奥迪Q5",
"productpic" : "http://xxxx/aodi.pic",
"classid" : 1001
}
]
Rayzzzz
9 years, 4 months ago
Answers
我这么写的不知道可否?
item.py
class PclassItem(scrapy.Item):
'''
产品分类item
'''
cid = scrapy.Field()
cname = scrapy.Field()
class ProductItem(scrapy.Item):
'''
产品item
'''
pcid = scrapy.Field()
pname = scrapy.Field()
pid = scrapy.Field()
DemoSplider.py
class DempSpider(scrapy.spiders.Spider):
def parse(self,response):
item = PclassItem()
cid = response.xpath("//xxxx")
item["cid"] = cid
pass
yield item
producturl = response.xpath("//xxxx")
yield Request(producturl, meta={"cid" : cid},callback=parse_product)
def parse_product(scrapy.spiders.Spider):
item = ProductItem()
item['pid'] = response.meta['cid']
pass
yield item
pipelines.py
class DemoPipeline(object):
def process_item(self, item, spider):
if item.__class__ == PclassItem:
pass
if item.__class__ == ProductItem:
pass
Gordius
answered 9 years, 4 months ago