[問題] 請問擷取原始碼中文問題
擷取的頁面:http://isin.twse.com.tw/isin/C_public.jsp?strMode=2
我是用python3 , sublime執行
但印出的中文會顯示如\xa1@\xa5x\xaad這樣字眼
使用python console >>> b'\xa1@\xa5x\xaad'.decode('utf-8')解不出來
請教各位這該如何解,編碼實在很惱人...
---
# -*- coding:utf8 -*-
import urllib.request as urllib2
import sys
headers = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
req = urllib2.Request('http://isin.twse.com.tw/isin/C_public.jsp?strMode=2' , headers=headers)
content = urllib2.urlopen(req).read()
print(content)
---
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 61.231.192.105
※ 文章網址: https://www.ptt.cc/bbs/Python/M.1435937158.A.4A3.html
→
07/03 23:48, , 1F
07/03 23:48, 1F
推
07/04 00:10, , 2F
07/04 00:10, 2F
→
07/04 00:10, , 3F
07/04 00:10, 3F
→
07/04 00:22, , 4F
07/04 00:22, 4F
→
07/04 00:31, , 5F
07/04 00:31, 5F
→
07/04 00:35, , 6F
07/04 00:35, 6F
→
07/04 00:36, , 7F
07/04 00:36, 7F
→
07/04 00:37, , 8F
07/04 00:37, 8F
→
07/04 01:38, , 9F
07/04 01:38, 9F
→
07/04 01:49, , 10F
07/04 01:49, 10F
→
07/04 16:27, , 11F
07/04 16:27, 11F