[試題] 108-1 李琳山數位語音處理概論期中考

看板NTU-Exam作者unmolk (UJ)時間4年前 (2021/06/27 06:22)推噓1(1推 0噓 0→)

留言1則, 1人參與討論串1/1

課程名稱︰數位語音處理概論課程性質︰電機系/資訊系選修課程教師︰李琳山開課學院：電資學院開課系所︰電機系考試日期（年月日）︰108.11.09 考試時限（分鐘）：120 試題 : 註：以下部分數學式以LaTeX語法表示。 1. (8 pts) What is GMM? How do we use it with HMM for continuous speech recong- nition? 2. (12 pts) Given an HMM with parameters \lambda = (A, B, \pi), an observation sequence \bar{O} = o_1,...,o_t,...,o_T and a state sequence \bar{q} = q_1,.. .,q_t,...,q_T, define \alpha_t(i) = Prob[o_1,...,o_t, q_t = i | \lambda] \beta_t(i) = Prob[o_{t+1},...,o_T | q_t = i, \lambda] We usually assume Prob[\bar{O}, q_t = i | \lambda] = \alpha_t(i)\beta_t(i). (3 pts) Show that Prob(\bar{O} | \lambda) = \sum_{i=1}^N[\alpha_t(i)\beta_t(i)]. (3 pts) Show that Prob(q_t = i | \bar{O}, \lambda) = \frac{\alpha_t(i)\beta_y(i)}{\sum_{i=1}^N[\alpha_t(i)\beta_t(i)]}. (6 pts) Formulate and describe the procedures for Viterbi algorithm to find the best state sequence \bar{q}^* = q_1^*,...,q_t^*,...,q_T^*. 3. (10 pts) Please explain how LBG algorithm and K-means algorithm work respec- tively. Does K-means algorithm always yeild the same result regardless of d- ifferent initialization? 4. (10 pts) While training triphone acoustic models, data and parameter sharing is a common approach to ensure that there is enough data to train each acou- stic model. Such sharing technique usually occurs on the state level. Please explain what this means. 5. (15 pts) You are taking an adventure in the Mabao forest. There are only fo- ur kinds of animals in the forest: otters, foxes, squirrels and duckbills. You know that the population percentage of each kind of animals is 30%, 20%, 40% and 10%, respectively. One morning, you see a brown-colored creature with white strips on its back and a black tail run away swiftly, while it is too sudden that you cannot c- learly recognize which species it is. Luckily, you have got the probability of the three characteristics observed on each of the four species from a pr- evious research listed in Table 1, where o_1, o_2, o_3 refer to "brown-colo- red", "white-striped" and "black-tailed". Moreover, you know that for each of the four species, the three characteris- tics happen independently, that is \forall i \neq j, o_i \neq o_j | c_k. In order to make you guess more efficient so that you can spend most of your time enjoying the wilderness, you decide to make a decision tree for animal classification based on the three question: "Whether it is brown-colored", "Whether it has white strips" and "Whether it has black tail". The decision tree is like the one in Figure 1. Please build up this decision tree by put- ting the three questions into the three nodes. What is the entropy reduction resulted from the uppermost node of the tree? YOu are allowed to leave the logarithmic term in your answer instead of giving a numerical solution. | p(o_1 | c_i) | p(o_2 | c_i) | p(o_3 | c_i) | otter(c_1) | 0.8 | 0.3 | 0.8 | fox(c_2) | 0.1 | 0.3 | 0.4 | squirrel(c_3) | 0.2 | 0.7 | 0.4 | duckbill(c_4) | 0.8 | 0.3 | 0.2 | Table 1: The Posterior Probability of the Three Charecteristics question A /(T) \(F) question B question C /(T) \(F) /(T) \(F) class a class b class c class d Figure 1: Sample Decision Tree (Hint: You do not need to actually compute the entropy of the whole tree. I- stead, you should be able to come up with a "best" tree structure by simply looking at the posterior probability of the three characteristics. Trust yo- ur intuition!) 6. (10 pts) Explain: What is entropy? What is perplexity of a language model w- ith respect to a test corpus? 7. (10 pts) Explain the OOV problem and how this problem for high frequency OOV words can be solved for Chinese language. 8. (10 pts) Explain the following two things: (5 pts) What are excitation and formant structure? Which one is more import- ant in speech recognition? Why? (5 pts) What is voiced speech? What is pitch? How is it related to the tone in Mandarin Chinese? 9. (6 pts) Describe the precise way of measuring the recognition errors between the following two strings in digital string recognition: (3 pts) a as the reference and b the machine output (3 pts) b as the reference and a the machine output (a) 52030325 (b) 5940345 10. (9 pts) Explain what bean search is. Whata are the advantage of using it in a large-vocabulary continuous speech recognition system? What are the trade -off in choosing the bean which for it? -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.24.173.199 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/NTU-Exam/M.1624746149.A.C30.html

推

rod24574575

06/27 11:02, 4年前 , 1^F

06/27 11:02, 1^F

‣ 返回看板[ NTU-Exam ] 台大

‣ 更多 unmolk 的文章

文章代碼(AID): #1WrwYbmm (NTU-Exam)

[試題] 108-1 李琳山 數位語音處理概論 期中考

[試題] 108-1 李琳山數位語音處理概論期中考