Python 爬虫调用 requests 如何设置代理(GoAgent/GoAgentX)?


空壳代码:


 # -*- utf-8 -*-
import requests

s = requests.session()
login_data = {'email': 'myEmail', 'password': 'psw'}
s.post('https://www.facebook.com/', login_data)
r = s.get('http://www.facebook.com/people/someone/followers')
print r.text.encode('utf-8')

稍加修改就能在知乎上爬数据,但无法对facebook使用

requests如何设置代理?

OS X GoAgentX 全局代理

根据码友的提示:


 import requesocks as requests

session = requests.session()

session.proxies = {'http': 'socks5://127.0.0.1:8087',
                   'https': 'socks5://127.0.0.1:8087'}
resp = session.get('https://www.facebook.com/', auth=('myEmail', 'psw'))

print(resp.text)

python python-爬虫 网页爬虫

各种杯具! 11 years, 2 months ago

要确定你的代理是可用的,我用的是Goagent,下面的代码测试通过了.


 #!/usr/bin/env python2.7
# -*- utf-8 -*-

import requests
import re

s = requests.Session()

proxies = {
        'http': 'http://127.0.0.1:8087',
        'https': 'http://127.0.0.1:8087',
}

login_data = {
        'email': '[email protected]',
        'pass': 'mima',
}

r = s.get('https://www.facebook.com/login.php?login_attempt=1', proxies=proxies, verify=False)

lgm = re.search('name="lgnrnd" value="([^"]+)"', r.content, re.I | re.S)
lsm = re.search('name="lsd" value="([^"]+)"', r.content, re.I | re.S)
if lgm is None or lsm is None:
    print 'cannot get form param: lgnrnd or lsd'
else:
    login_data.update({'lgnrnd': lgm.group(1), 'lsd': lsm.group(1)})

    l = s.post('https://www.facebook.com/login.php?login_attempt=1', params=login_data, proxies=proxies, verify=False)
    m = re.search('sk=welcome', l.content, re.I | re.S)
    if m is not None:
        print 'login success.'
    else:
        print 'login failed.'

天空的荣耀 answered 11 years, 2 months ago

Your Answer