[請益] 機器翻譯

看板Linguistics作者 (Si c'etait possible...)時間5年前 (2019/03/22 22:18), 編輯推噓1(100)
留言1則, 1人參與, 5年前最新討論串1/1
目前正在讀機器翻譯文章,其中一小段一直讀不懂,懇請版上前輩給予指教。 Error classification and annotation is carried out when the focus is on the understanding the types of errors produced by an MT system and their frequency. An example of an error typology for the evaluation of MT is proposed by Vilar, Xu, d'Haro, and Ney. This form of evaluation was particularly useful when the dominant MT paradigm was rule based; that is, it was possible to "code" linguistic rules for the transfer of words, phrases, and grammatical structures from one source language into a target language. The use of error typology for the more recent data-driven or statistical machine translation is more limited because, in this case, the nature and volume of the data, as opposed to formal lingusitic rules, dictate the output to a large extent. error classification指將機器譯錯的部分分類,例如missing word, incorrect word order... 等。這一套做法不適用於statistical machine translation (SMT有學習能力,給予幾組翻譯譯文對照,機器進行分析,得到某種規則或公式,接下 遇到類似的翻譯,就有能力翻得出來。 我看不太懂的地方在the nature and volume of the data dictate the output to a large extent。依照SMT操作模式,譯文產出與否,DATA量做為規則歸納來源, 所以很重要,但和nature (資料種類?)什麼關係? -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.235.13.246 ※ 文章網址: https://www.ptt.cc/bbs/Linguistics/M.1553264337.A.AE3.html

04/02 16:01, 5年前 , 1F
nature感覺是指「性質」? 直覺想到的是它的messy程度?
04/02 16:01, 1F
文章代碼(AID): #1SbExHhZ (Linguistics)