BeautifulSoup解析网页的问题
soup = BeautifulSoup(urlopen(url).read())
这样做就解析不了网页
soup.findAll('')
获取什么节点都没有
而把
html = urlopen(url).read()
,html打印出来,在控制台复制粘贴给变量 content,然后这样做
soup = BeautifulSoup(content)
,就能解析成功呢?
jpshoo
10 years, 10 months ago
Answers
from bs4 import BeautifulSoup
import urllib
url = 'http://soccerdata.sports.qq.com/playerSearch.aspx?lega=epl&pn=9'
soup = BeautifulSoup(urllib.urlopen(url).read())
print len(soup.findAll())
print len(soup.findAll(''))
print len(soup.findAll('div'))
content = '
<div><h1>This is my homepage.</h1><p>Do you know?</p></div>
'
soup2 = BeautifulSoup(content)
print len(soup2.findAll())
print len(soup2.findAll(''))
print len(soup2.findAll('div'))
输出:
856
0
38
3
0
1
[Finished in 1.5s]
测试平台:
- Win7 中文版 SP1
- Python2.7.6
- BeautifulSoup4.3.2
运行结果,供楼主参考。
有图有真相:
看一下你用的是哪个版本的bs吧
对银发没辙啊!
answered 10 years, 10 months ago