[問題] 從網頁抓資料,中文處理上的問題

看板Python作者elmo56 (政大柯景騰)時間11年前 (2014/10/25 20:13)推噓1(1推 0噓 9→)

留言10則, 5人參與討論串1/1

我的python 是2.7版是用beautifulsoup 去抓網頁資料抓到了table裡面的值例如 a[2]= <td> 雅虎新聞 Yahoo news </td> a[3]= <td> 四 thr </td> 我也透過 a[2]=a[2].get_text() 把tag給去掉只留下 text的部分若我現在 print a[2],a[3] 結果: 雅虎新聞 Yahoo news 四 thr 但現在問題是若我設一個 newslist=[] 再把 newslist.append(a[2]) newslist.append(a[3]) 在print newslist 結果會變成中文字是亂碼英文是正常的單獨印出那個位置的時候正常 printf newslist[0] 會顯示雅虎新聞 Yahoo news printf newslist 會變成 u'\u4eda\u623f\u4eds\ Yahoo news u'\u4dsw thr 上面編碼是我亂打的但會是這樣的情況要印出整個list 或是dict 就會亂亂的故發文求解惑謝謝大家 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.119.164.134 ※ 文章網址: http://www.ptt.cc/bbs/Python/M.1414239194.A.6AB.html

→

alibuda174

10/25 20:53, , 1^F

10/25 20:53, 1^F

※ 編輯: elmo56 (140.119.164.134), 10/25/2014 20:56:23

→

alibuda174

10/25 20:54, , 2^F

10/25 20:54, 2^F

※ 編輯: elmo56 (140.119.164.134), 10/25/2014 21:01:51

→

alibuda174