[試題] 103-1 陳炳宇 計算機組織與結構 期末考

看板NTU-Exam作者 (翔子)時間9年前 (2015/01/16 17:22), 9年前編輯推噓1(101)
留言2則, 1人參與, 最新討論串1/1
課程名稱︰計算機組織與結構 課程性質:資管系大二必修 課程教師︰陳炳宇 開課學院:管理學院 開課系所︰資訊管理學系 考試日期(年月日)︰2015/1/13 考試時限(分鐘):180分鐘 是否需發放獎勵金:是 (如未明確表示,則不予發放) 試題 : Computer Organization and Structure Final Exam. Date: 2015/1/13 1. (8%) We have a program core consisting of five conditional branches. The program core will be executed thousands of times. Below are the outcomes of each branch for one execution of the program core (T for taken, N for not taken). Branch 1: T-T-T Branch 2: N-N-N-N Branch 3: T-N-T-N-T-N Branch 4: T-T-T-N-T Branch 5: T-T-N-T-T-N-T Assume the behavior of each branch remains the same for each program core execution. For dynamic schemes, assume each branch has its own prediction buffer and each buffer initialized to the same state before each execution. List the predictions for the following branch prediction schemes: a. Always taken b. Always not taken c. 1-bit predictor, initialized to predict taken d. 2-bit predictor, initialized to weakly predict taken What are the prediction accuracies? 2. (2%) What is the difierence between CPU and GPU? What kinds of problems are GPUs suited to handle? 3. (12%) For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the cache. Tag Index Offset a. 31-12 11-6 5-0 b. 31-10 9-5 4-0 a. What is the cache line size (in words)? b. How many entries does the cache have? c. What is the ratio between total bits required for such a cache implementation over the data storage bits? 4. (15%) What is the average CPI for each of the following 3 schemes taking to execute the code sequence below? (Note: For the pipeline scheme, there are five stages: 1F, ID, EX, MEM, and WB. We assume the reads and writes of register file can occur in the same clock cycle, and the stall circuits are available.) add $t3 , $s1 , $s2 sub $t1 , $s1 , $s2 lw $t2, 100($t3) sub $s1, $tl, $t2 a. single cycle scheme b. pipelined scheme without data forwarding hardware c. pipelined scheme with data forwarding hardware (one from EX/MEM to ALU input and the other from MEM/WB to ALU input) available 5. (8%) Consider the following code segment in C: A = B + E; C = B + F; Here is the generated MIPS code for this segment, assuming all variables are in memory and are addressable as offsets from $t0: lw $tl, O($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $tl, $t4 sw $t5, 16($t0) Find the hazards in the code segment and reorder the instructions to avoid any pipeline stalls. 6. (20%) A majority function is generated in a combinational circuit when the output is equal to 1 if the input variables have more 1's than 0's. The output is 0 otherwise. a. Please write the truth table for a 4-input majority function. b. What are the functions in sum of products forms? (you can just use "little m" notation) c. Please use the Kamaugh map to find the minimum sum of products form and the minimum sum of products form for the complement. d. Please draw the logic schematic by using AND, OR, and INVERT gates. 7. (15%) Assume the three caches below, each consisting of 16 words. Given the series of address references as word addresses: 2, 3, 4, 16, 18, 16, 4, 2. Please label each reference as a hit or a miss for the three caches (a), (b), and (c) below. Assuming that LRU is used for cache replacement algorithm and all the caches are initially empty. a. A direct-mapped cache with 16 one-word blocks b. A direct-mapped cache with 4 four-word blocks c. A four-way set associative cache with block size of one-word 8. (10%) Suppose we have a processor with a base CPI of 1.0, assuming all references hit in the primary cache, and a clock rate of 5 GHz. Assume a main memory access time of 100 ns, including all the miss handling. Suppose the miss rate per instruction at the primary cache is 2%. How much faster will the processor be if we add a secondary cache that has a 5 ns access time for either a hit or a miss and is large enough to reduce the miss rate to main memory to 0.6%? 9. (10%) Please describe the Amdahl's law on parallel computing and use it to calculate the following question. There is a task with 60% work parallelizable, what is the speed up if it runs using 10 processors? -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.112.7.214 ※ 文章網址: https://www.ptt.cc/bbs/NTU-Exam/M.1421400170.A.BFF.html ※ 編輯: h999342 (140.112.25.106), 01/19/2015 09:27:04

02/13 00:41, , 1F
收電機系
02/13 00:41, 1F

02/13 00:42, , 2F
更正:資管系
02/13 00:42, 2F
文章代碼(AID): #1KkDXgl_ (NTU-Exam)