BeautifulSoup解析网页的问题


soup = BeautifulSoup(urlopen(url).read()) 这样做就解析不了网页
soup.findAll('') 获取什么节点都没有
而把 html = urlopen(url).read() ,html打印出来,在控制台复制粘贴给变量 content,然后这样做 soup = BeautifulSoup(content) ,就能解析成功呢?

python beautifulsoup

jpshoo 10 years, 10 months ago

 from bs4 import BeautifulSoup
import urllib

url = 'http://soccerdata.sports.qq.com/playerSearch.aspx?lega=epl&pn=9'
soup = BeautifulSoup(urllib.urlopen(url).read())
print len(soup.findAll())
print len(soup.findAll(''))
print len(soup.findAll('div'))

content = '

<div><h1>This is my homepage.</h1><p>Do you know?</p></div>

'
soup2 = BeautifulSoup(content)
print len(soup2.findAll())
print len(soup2.findAll(''))
print len(soup2.findAll('div'))

输出:


 856
0
38
3
0
1
[Finished in 1.5s]

测试平台:

  • Win7 中文版 SP1
  • Python2.7.6
  • BeautifulSoup4.3.2

运行结果,供楼主参考。


有图有真相:
请输入图片描述

看一下你用的是哪个版本的bs吧

对银发没辙啊! answered 10 years, 10 months ago

Your Answer