[問題] 讀取UTF-8編碼
開發平台(Platform): (Ex: VC++, GCC, Linux, ...)
linux, gcc
問題(Question):
文件是xml
採用UTF-8編碼
如果用char *[]去處理的話 之後要針對標點符號split會失敗
所以我查到用wchar_t *[]去處理
可是結果不如預期
餵入的資料(Input):
底下連結是xml檔案
https://dl.dropboxusercontent.com/u/100819329/file.zip
預期的正確結果(Expected Output):
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<p>
<id>01</id>
<p>你好</p>
錯誤結果(Wrong Output):
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<p>
<id>01</id>
<p>
程式碼(Code):(請善用置底文網頁, 記得排版)
https://gist.github.com/anonymous/11058612
補充說明(Supplement):
我從下午6點用到晚上2點還沒搞定@@
或是有別的方法可以把UTF-8的xml正確讀入呢@@?
之後要能夠支援依照標點符號切割成子字串..
請高手指點!
謝謝
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 111.249.189.178
※ 文章網址: http://www.ptt.cc/bbs/C_and_CPP/M.1397846723.A.CEC.html
※ 編輯: wsx100 (111.249.189.178), 04/19/2014 02:53:49
→
04/19 03:08, , 1F
04/19 03:08, 1F
→
04/19 03:09, , 2F
04/19 03:09, 2F
推
04/19 07:04, , 3F
04/19 07:04, 3F
→
04/19 07:06, , 4F
04/19 07:06, 4F
→
04/19 07:06, , 5F
04/19 07:06, 5F
→
04/19 07:07, , 6F
04/19 07:07, 6F
→
04/19 07:07, , 7F
04/19 07:07, 7F
→
04/19 07:08, , 8F
04/19 07:08, 8F
→
04/19 07:09, , 9F
04/19 07:09, 9F
推
04/19 09:55, , 10F
04/19 09:55, 10F
推
04/19 11:23, , 11F
04/19 11:23, 11F
推
04/19 13:29, , 12F
04/19 13:29, 12F
→
04/19 17:27, , 13F
04/19 17:27, 13F
→
04/19 17:27, , 14F
04/19 17:27, 14F
→
04/19 17:28, , 15F
04/19 17:28, 15F
→
04/19 17:29, , 16F
04/19 17:29, 16F
推
04/21 11:46, , 17F
04/21 11:46, 17F
→
04/21 11:46, , 18F
04/21 11:46, 18F
討論串 (同標題文章)