用python抓新浪首页,返回的是乱码
这是我的代码:
#-*-coding:utf-8-*-
import requests
url = "http://www.sina.com.cn"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36'}
html = requests.get(url,headers = headers)
print html.text
返回的字符串:
新浪首页的字符集是gb2312,程序里我也来回转过编码,但貌似都没用,所以又改回来了,编辑器里的编码我也设置过了,还是不行,上网搜了一晚上,还是没找到解决方法,所以谢谢大家了
kiririn
9 years, 4 months ago
Answers
H.ero
answered 9 years, 4 months ago
requests 提供转码方法
python
In [1]: html.encoding Out[1]: 'ISO-8859-1' In [2]: html %%html html In [2]: html.encoding = 'gb2312' In [3]: print html.text[:500] <!DOCTYPE html> <!--[30,131,1] published at 2015-08-06 23:33:28 from #130 by system--> <html> <head> <link rel="icon" sizes="any" mask href="http://www.sina.com.cn/favicon.svg"> <meta name="theme-color" content="red"> <link rel="icon" type="image/x-icon" href="http://www.sina.com.cn/favicon.ico"> <meta http-equiv="Content-type" content="text/html; charset=gb2312" /> <title>新浪首页</title> <meta name="keywords" content="新浪,新浪网,SINA,sina,sina.com.cn,新浪首页,门户,资讯" /> <meta name="descript In [4]:
一日就是一天
answered 9 years, 4 months ago