Question

0 0

BeautifulSoup解析网页的问题

soup = BeautifulSoup(urlopen(url).read()) 这样做就解析不了网页
soup.findAll('') 获取什么节点都没有
而把 html = urlopen(url).read() ,html打印出来，在控制台复制粘贴给变量 content，然后这样做 soup = BeautifulSoup(content) ，就能解析成功呢？

python beautifulsoup

11 years, 3 months ago

jpshoo

share

jpshoo 11 years, 3 months ago

Answer 1

0


 from bs4 import BeautifulSoup
import urllib

url = 'http://soccerdata.sports.qq.com/playerSearch.aspx?lega=epl&pn=9'
soup = BeautifulSoup(urllib.urlopen(url).read())
print len(soup.findAll())
print len(soup.findAll(''))
print len(soup.findAll('div'))

content = '

<div><h1>This is my homepage.</h1><p>Do you know?</p></div>

'
soup2 = BeautifulSoup(content)
print len(soup2.findAll())
print len(soup2.findAll(''))
print len(soup2.findAll('div'))

输出：


 856
0
38
3
0
1
[Finished in 1.5s]

测试平台：

Win7 中文版 SP1
Python2.7.6
BeautifulSoup4.3.2

运行结果，供楼主参考。

有图有真相：
请输入图片描述

看一下你用的是哪个版本的bs吧

answered 11 years, 3 months ago

对银发没辙啊！

share

对银发没辙啊！ answered 11 years, 3 months ago

BeautifulSoup解析网页的问题

jpshoo

Answers

对银发没辙啊！

Your Answer