[考古] 資料挖掘與資料倉儲/林明言/972期中考

看板FCUProblems作者a761007 (Daniel)時間15年前 (2009/04/20 20:27)推噓1(1推 0噓 1→)

留言2則, 2人參與討論串1/1

[開課學院]: 資電學院 [開課系所]: 資訊系 [課程名稱]: 資料探勘與資料倉儲 [老師名稱]: 林明言老師 [開課學期]: 972 [類型]: 97-2期中考 1.[8%] 以上為各itemset及其support值: {a}:0.9, {b}:0.4, {c}:0.8, {e}:0.6, {a,b}:0.65, {a,c}:0.75, {a,e}:0.55, {b,c}:0.6, {b,e}:0.5, {c,e}:0.48, {a,b,e}:0.4, {b,c,e}:0.45, {a,b,c,e}:0.3等。 (a) association rule: {a,c}→{b,c}之 support =___, confidence =___。 (b) association rule: {b,c}→{a,e}之 support =___, confidence =___。 2.[3,3%] 若frequent itemsets L3 只包括{1,2,3}, {1,2,4}, {1,3,5}, {2,3,5}, {2,4,5}, {3,4,5}。 (a) join-step 得到candidate itemsets C4(尚未prune)為___________。 (b) 經過prune後，可能的C4為____________________________________。 3.[2,4,5%] 以表一的資料庫資料計算(列出計算式即可) (a) Class 的 information =________ (b) 欄位 ORS的entropy =___________ (c) 假設表一的資料庫增加一筆資料「ID=12, ORS=R, Class=3」，請計算欄位ORS的 information gain =__________________ ID ORS Class 1 R 1 2 S 1 3 S 1 4 R 2 5 R 2 6 S 2 7 O 2 8 O 2 9 O 2 10 R 2 11 O 2 4.[2.5,2.5,2,3%] (a) X-model的lift =_________ (b) Y-model的lift =_________ (c) X-model的error rate =_________ (d) Y-model的recall =_________ X-model Computed-Accept Computed-Reject Accepts 45 55 Reject 1955 7945 Y-model Computed-Accept Computed-Reject Accepts 46 54 Reject 2245 7955 5.[2,4,4%] (a) 列出表三的symmetric attribute_____________ (b) 不考慮表三的symmetric attribute, 算dissimilarity:d(甲,丙)及d(乙,丙) d(甲,丙)=_________ d(乙,丙)=___________ (c) 表三中attribute 的weight, symmetric比asymmetric為1:4, 算d(甲,乙)=______ 病性區發咳測試患別域燒嗽１２３４甲男北ＹＮＰＮＮＮ乙女北ＹＮＰＰＰＮ丙男南ＹＹＮＮＮＰ丁男北ＮＮＮＮＮＰ　註：病患的「區域」:「南」「北」機率各1/2 6.[4,4,4%] 將某實驗資料(34,29,28,26,4,8,14,15,20,10,3,15) (a) 列出以equi-depth partition(depth=4) 後各bin的內容 (b) 列出以smoothing by bin-boundaries 後各bin的內容 (c) 將原始實驗資料做min-max normalization, 新範圍是 [1,10], normalize後 8跟28=___,____ 7.[2,2%] 將予下圖之decision tree, (a) 可以產生幾條規則? (b) 寫出一條規則 ps. 因為圖拍的很模糊, 無法提供圖, 此題請參考課本範例題型雷同 8.[5%] 現有之hash-tree(node size=3)及subset function如下圖，畫出加入(3,4,7), (3,5,8)後的tree. ┌──────┼──────┐ subset function │ │ │ ┌───┼───┐ ┌──┼──┐ 2 3 4 ┌──┼──┐ 1,4,7 2,5,8 3,6,9 1 4 5 │ 1 3 6 5 6 7 3 4 5 │ 3 6 7 │ │ 3 6 8 ┌───┼───┐ 3 5 6 1 2 4 1 2 5 1 5 9 3 5 7 4 5 7 4 5 8 6 8 9 9.[4,4%] Frequent itemset 及其count值如下: (a):9, (b):8, (d):9, (e):6, (ab):7, (ad):9, (ae):5, (bd):8, (de):5, (ade):5 (a) maximum itemsets =____________ (b) closed itemsets =____________ 10.[6,4,4,4,5,5%] (a)何謂data mining? (b) training data與test data有甚麼差異? (c) clustering跟classification有甚麼最大不同? (d) 舉兩個「concept hierarchy」的例子. (e) 簡述「K-means clustering」. (f) KDD的程序包含哪幾個步驟. -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 59.126.206.66

→

a761007

04/20 20:29, , 1^F

04/20 20:29, 1^F

推

XX9

04/21 00:00, , 2^F

04/21 00:00, 2^F

‣ 返回看板[ FCUProblems ] 逢甲

‣ 更多 a761007 的文章

文章代碼(AID): #19x6eySP (FCUProblems)