Re: [問題] oem string 是什麼?

看板Programming作者purpose (purpose)時間12年前 (2012/09/18 09:21)推噓3(3推 0噓 1→)

留言4則, 4人參與討論串2/2 (看更多)

※ 引述《jonce007 (汪汪)》之銘言： : 最近在看mail list的討論，其中有一段提到： : Hi, : Never use *ANSITOOEM* or any other similar functions. : All what is necessary is correct CP set in HVM by : hb_cdpSelect( <cCP> ) or SET( _SET_CODEPAGE, <cCP> ) : (both functions make the same operation) : best regards, : P****** : 雖然我知道big5中的codepage是cp950, : 從來用過 ANSIToOEM() or OEMTOANI()的函數 : 一直不明白什麼是ansi string 什麼又是oem string, : 又google不到中文說明 : 小弟是自學程式的。請前輩先進不吝說明一下^^ 隨便一個文字檔，在硬體內部都是用二進位記錄其內容，也就是 010101... 然後有個叫編碼的東西，首先把這些二進位編排成「1位元-1位元-1位元...」格式也就是像「0x41-0x42-0x43」這樣的十六進位形式，接著再規定 0x41 代表英文字母 'A' 0x42 代表英文字母 'B' 0x43 代表英文字母 'C' 然後有些國家的文字比較麻煩，像中、日、韓，文字的數量比較多，所以只要碰到該組內碼值 >= 0x80 就代表這是連續內碼，要跟下個內碼值結合 0xA4CF-0xABFC-0xBCD0 就是繁體中文編碼 CP950 下的「反指標」三個字所謂的 ANSI 編碼就是這種東西。當你的作業系統安裝的時候，使用語言選擇繁體中文，則電腦的 Active CodePage 值就是設定為 CP950，所以文件內容中的 0xA4CF 內碼就被解讀成 '反' 字。當你的作業系統使用日文時，則同樣一份文件，同樣的內碼值 0xA4CF 就不再被解讀成 '反' 這個字，而會是日文的標點符號。那麼作業系統使用英文時，那麼你的 Active CodePage 就會是 CP-1252。古早在 DOS 時代的作業系統，也可以選擇使用什麼語言，在當時的英文 DOS 下，其 CodePage 才會叫 OEM CodePage，如果是英文，那就會是 437，不是 1252。 http://msdn.microsoft.com/en-us/goglobal/cc305156 反正文字檔或者說字串本身是無屬性的，你的作業系統用什麼設定，他就用哪套編碼去解讀。 ANSI 才是目前 Windows 的主流，看到任何跟 OEM 有關的字眼，你自動認定與你無關，古早的東西，你用不到，這樣就好了。嚴格來說，真正的主流是 Unicode，只是 Windows 預設的文字檔都會用 ANSI 編碼。一份文字文件，在 Windows 下其實開啟時，其實會先判斷是否為 Unicode 文字檔。如果是 ANSI 編碼那就照之前說的，看作業系統的 CodePage 值來解讀。如果文件是 Unicode 編碼，那代表不管用什麼語言的作業系統去看此文件，都能看到相同的內容，因為大家都用同樣的一套編碼方式。詳細的，自己 google 吧自己 google 吧自己 google 吧 ######## 追加補充根據這篇文章 http://support.microsoft.com/kb/65124/zh-tw 中說的 One of the main problems developers face when writing international Windows-based applications is handling characters sets. It is very important to understand ANSI and OEM. ANSI is the character set used internally by Windows and its applications. Windows does not recognize any character set other than ANSI. OEM is defined by Windows as the character set used by MS-DOS. The term "OEM" does not refer to a specific character set; instead, it refers to any of the different character sets (code pages) that can be installed and used by MS-DOS. Because Windows runs on top of MS-DOS, there must be a layer between Windows and MS-DOS that performs translations between ANSI and OEM. When Windows is first installed, the Windows Setup program looks at the MS-DOS-installed character set, and then installs the correct ANSI-OEM translation tables and Windows OEM fonts. Windows-based applications should use the Windows functions AnsiToOem() and OemToAnsi() when transferring information to and from MS-DOS. Also, applications should use the correct character set when creating filenames. 首先檔案名稱的部份別管他，因為舊的檔案系統才會用到非 Unicod 的東西，自從 NTFS 檔案系統變成主流開始，檔案名稱的內部記錄，就是用 Unicode 儲存。該文提到 Windows runs on top of MS-DOS 這應該是指舊的 Windows 98 時代左右的狀況，現在的 Windows 完全跟 MS-DOS 毫無任何關係。現在的 Windows 裡面的命令提示字元，只是 Windows 系統附帶的一個程式，這個程式的功能類似 VMware, VirtualBox 都是虛擬機器，嚴格來說，是 DOS 模擬器。然後你也別指望這個 DOS 模擬器裡面的東西，能跟 Linux 裡的終端機一樣，很完整很順暢的支援 Unicode。他就只是個陽春的模擬器，能讓你玩玩 DOS 時代的 H-Game 過過癮就頂天了。這個 DOS 模擬器，在繁體中文 Windows 他預設就會使用 OEM codepage 950 http://msdn.microsoft.com/en-us/goglobal/cc305155 (繁體中文的 Codepage 從 OEM 到 ANSI 都一樣是 CP950) 所以 printf("你好"); 在預設情況下，是可以正常顯示出中文來的。你可以用 chcp 437 指令切換到英文，或用 chcp 65001 切換到 UTF-8，反正還是一個重點，這模擬器只是陽春的產品，會有 Bug 也會有某些地方怪怪的。比如打 chcp 936，抱歉，無法切換到大陸人的簡體中文。所以現在就沒有 MS-DOS，那就不需要轉換 OEM 來轉換 OEM 去，沒有人要跟你玩了，你去 google 看能找到幾篇還在扯這兩個函數的。看是討論 "我們發財了" 的文章比較多，還是討論這兩個函數的文章比較多。 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 124.8.134.73

推

MOONRAKER

09/18 19:31, , 1^F

09/18 19:31, 1^F

推

jonce007

09/18 22:49, , 2^F

09/18 22:49, 2^F

→

purpose

09/18 23:27, , 3^F

09/18 23:27, 3^F