Re: [問題] 用requests.post爬蟲以及編碼的問題

看板Python作者iPhone007 (iPhone007)時間9年前發表 (2016/07/01 16:18), 9年前編輯推噓0(0推 0噓 5→)

留言5則, 3人參與討論串2/2 (看更多)

用以下的方法硬解，雖然方法不是很好，不過似乎是可以解出資料看是不是能拋磚引玉，請其他大大提出好的解法 input_year = '105' input_month = '06' import requests url='http://www.twse.com.tw/ch/trading/indices/MI_5MINS_HIST/MI_5MINS_HIST.php' payload = { 'myear':input_year, 'mmon':input_month } res = requests.post(url, data = payload) from bs4 import BeautifulSoup res.encoding = 'big5' idx_bgn = res.text.index(u"<div align=center class=til_2>") idx_end = res.text.index(u"") html_text = res.text[idx_bgn:idx_end] soup = BeautifulSoup(html_text) table_board_trad = soup.select('table')[1] trCount = 0; for tr in table_board_trad.select('tr'): if(trCount>1): print tr.select('td')[0].text + ' ' +tr.select('td')[1].text trCount = trCount + 1 ※ 引述《akpipnlge (akpipnlge)》之銘言： : 小弟因為專題需要爬證交所網站的一些資料，所以用python 2.7 和requests套件操作 : 網址如下： : http://www.twse.com.tw/ch/trading/indices/MI_5MINS_HIST/MI_5MINS_HIST.php : (每個月份都要爬) : 程式碼如下： : import requests : payload = { : 'myear': 2016, : 'mmom': 5 : } : url='http://www.twse.com.tw/ch/trading/indices/MI_5MINS_HIST/MI_5MINS_HIST.php' : page = requests.post(url, data=payload) : print page.text.decode('iso-8859-1').encode('utf8') : 然後就遇到兩個問題： : 1.有抓到東西，但是只有抓到其他不重要的，數據的部分完全沒有 : (應該是payload那有錯，抱歉小弟連html都沒寫過QQ) : 2.抓下來的編碼是亂碼，所以加了爬文看到的解碼那行，卻出現error： : UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position : 130: ordinal not in range(128) : 整整花了3個半天還是搞不定，只好PO文求救了QQ -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 223.26.109.76 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1467389938.A.1FC.html ※ 編輯: iPhone007 (223.26.109.76), 07/02/2016 00:23:38

→

akpipnlge

07/02 06:38, , 1^F

07/02 06:38, 1^F

→

akpipnlge

07/02 06:40, , 2^F

07/02 06:40, 2^F

→

akpipnlge

07/02 06:40, , 3^F

07/02 06:40, 3^F