Python:使用Scrapy框架的Imagepipeline组件如何下载gif呢?
默认情况下,使用
Scrapy
的
ImagePipeline组件
下载图片的时候,不论之前的图片格式是png还是gif,都会被保存成jpeg格式。
通过
重写file_path方法
,可以将图片以原来的格式和原图片名称进行保存。
重写file_path方法
__author__ = 'Fly'
#coding:utf-8
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.http import Request
from scrapy.exceptions import DropItem
class MyImagesPipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None):
image_guid = request.url.split('/')[-1]
return 'full/%s' % (image_guid)
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
return item
运行结果
图片URL:
http://www.baidu.com/1.gif
保存到本地:
1.gif
但是,当打开1.gif的时候,发现原本动态的图片现在却变成静态的了。
请问,有谁知道怎么处理吗?
Ritsu☆律
11 years, 2 months ago
Answers
试着覆盖convertimage
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/pipeline/images.py#L87
__author__ = 'Fly'
#coding:utf-8
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.http import Request
from scrapy.exceptions import DropItem
class MyImagesPipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None):
image_guid = request.url.split('/')[-1]
return 'full/%s' % (image_guid)
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
return item
def convert_image(self, image, size=None):
buf = StringIO()
image.save(buf)
return image, buf
试试,可能会出错,文档上说这个pipeline会:
Convert all downloaded images to a common format (JPG) and mode (RGB)
Avoid re-downloading images which were downloaded recently
Thumbnail generation
Check images width/height to make sure they meet a minimum constraint
bbzhu
answered 11 years, 2 months ago