python爬虫程序卡死
import requests
import cchardet
import sqlite3
import time
import logging
from multiprocessing.pool import ThreadPool
def execute(url):
total_time = time.time()
try:
start_time = time.time()
res = requests.get(url,timeout=1)
download_time = time.time() - start_time
start_time = time.time()
res.encoding = cchardet.detect(res.content)['encoding']
dencode_time = time.time() - start_time
except:
logging.warning(url)
pool = ThreadPool(12)
pool.map(execute,links)# links(a list):more than 60000 url of different site
上面是我用python3写的部分爬虫代码,程序总是运行到中途就会卡死,经定位发现是 res = requests.get(url,timeout=1)这句代码出了问题,因为我去掉这句代码后程序是可以欢快地跑完的。初学python,恳请大神赐教,这究竟是什么原因。links是一个list,存放了来自多个不同网站的6w多条url
寂寞的槑槑姬
9 years, 3 months ago