[問題] urllib2抓取需驗證的網站
網頁要輸入帳號密碼
我參考了
http://tinyurl.com/4kua9b8
這份文檔
要抓的網站:
http://tinyurl.com/3smpx27
腳本按照文檔來寫,只改了帳號密碼,和改了最後一行
import re,urllib2
from BeautifulSoup import BeautifulSoup
theurl = 'http://www.agileinsights.com/test/ROMI/?page_id=11'
username = '*****'
password = '*****'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
foo = urllib2.urlopen(theurl).read()
輸入foo後出來的還是要輸入帳號密碼的網頁
該怎麼才能登入?
謝謝!
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 119.40.37.34
→
10/13 12:37, , 1F
10/13 12:37, 1F
→
10/13 12:37, , 2F
10/13 12:37, 2F
→
10/13 13:19, , 3F
10/13 13:19, 3F
→
10/13 13:25, , 4F
10/13 13:25, 4F
→
10/13 13:25, , 5F
10/13 13:25, 5F
謝謝,用了他的方法可以抓xiami
但還是沒法抓我要的http://tinyurl.com/3smpx27
import urllib
import urllib2
posturl =
"http://www.agileinsights.com/test/ROMI/wp-login.php?redirect_to=/test/ROMI/"
values = {"user_login":"Tsung-Hsien", "user_pass":"ct5d$cswi^L7",
"wp-submit":"Log In", "rememberme":"forever"}
data = urllib.urlencode(values)
req = urllib2.Request(posturl, data)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page
實在找不到怎麼修正,感覺是POST有問題
xiami也是用同樣方法,但卻可以
上面的帳號密碼用瀏覽器可以登入進去,還請幫幫忙! 謝謝!
※ 編輯: Jason1122 來自: 119.40.37.34 (10/13 15:44)
→
10/13 16:01, , 6F
10/13 16:01, 6F
討論串 (同標題文章)
完整討論串 (本文為第 1 之 2 篇):