Re: [問題] 鍵盤輸入中文, 顯示亂碼

看板java作者sbrhsieh (十年～)時間12年前 (2013/07/18 16:22)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串2/2 (看更多)

※ 引述《danielkimo (Daniel)》之銘言： : Dear all, : 這類的問題, google半天找到都是一些簡體中文的資料, : 以下是我的程式碼, 但已經都轉成UTF-8, 卻還是亂碼 : BufferedReader br=new BufferedReader(new InputStreamReader(System.in,"UTF-8")); : BufferedWriter bw=new BufferedWriter(new OutputStreamWriter(System.out,"UTF-8")); : bw.write("請輸入你想查詢的字,輸入完畢請按Enter："); : bw.flush(); : String str=br.readLine(); : bw.write(str); : bw.flush(); : 拜託各位幫忙, 謝謝你在 Eclipse IDE 裡執行這個程式遇到亂碼，主要是 input 部分失真，導致輸出時得不到應有的 output。你在 Run configuration 裡的 Common 分頁裡所設定的 encoding 只要和建構 InputStreamReader 時使用的 encoding 相同，那麼透過 bw 輸出應該都會是正確的，當然字串常數部分還是要看你的 source code 使用的 encoding 與編譯時指定的編碼有無吻合而定。那輸入會失真(在 Windows 平台)的部分(假設你啟動 Eclipse 時 JRE 的預設編碼是 Big5)，是由於當你的程式執行起來後，System.in 這個 InputStream 在產出從 console 讀進的數據時，會做了兩次轉碼，舉個例： (以下使用十六進位) 當你在 Eclipse 提供的 console 輸入漢 Enter <====== 漢(\u6f22), Enter 則是 \r\n 那麼送出去的 byte sequence 是： E6 BC A2 0D 0A E6 BC A2 是'漢'字的 UTF-8 編碼後的 byte sequence。而 Java runtime 提供的 System.in 會試著先以 Big5 來 decode 成 unicode 字串(採用 replace 策略)，於是得到 "\u778d\ufffd\n"。 \ufffd 字元是當 decode byte sequence 時若使用的 encoding 無法處理某些片段時，CharsetDecoder 所使用的 replacement 字元。最後再以 UTF-8 來編碼 "\u778d\ufffd\n" 成為 System.in 的 output。 "\u778d\ufffd\n".getBytes("UTF-8") => E7 9E 8D EF BF BD 0A 也就是說當你鍵入:漢 Enter，從 System.in 這個 InputStream 讀出來的就已經失真成 E7 9E 8D EF BF BD 0A 這幾個 bytes，不管你使用何種 encoding 來 interpret 這些數據，基本上是沒有辦法獲得原來的輸入。詳細的原因(多一對 decode/encode 過程)尚不清楚，試過設定 sun.jnu.encoding property 為 UTF-8 也無助益。當你在 run configuration 的 Common 分頁設定 encoding 為 UTF-8 時，執行你的程式的 JVM instance 的 file.encoding property 會被妥善設定為 UTF-8，但是 JRE 在準備橋接 standard input 到 System.in 的過程中，所認定的 encoding (這也是本來我以為設定 sun.jnu.encoding 會有改善的原因)似乎會受 parent process 影響。我目前試過可行的辦法就是你執行 Eclipse 時讓系統預設編碼(file.encoding) 也是 UTF-8，這樣子就會避免掉多一層 decode/encode 程序，input 就不會失真。＊你可以在 eclipse.ini 裡加上一行 -Dfile.encoding=UTF-8 重新執行 Eclipse IDE。 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 1.172.243.161

‣ 返回看板[ java ] 程設

‣ 更多 sbrhsieh 的文章

文章代碼(AID): #1HvwM_Zl (java)