[試題] 101上 唐牧群 資訊檢索 期末考
課程名稱︰資訊檢索
課程性質︰必修
課程教師︰唐牧群
開課學院:文學院
開課系所︰圖資系
考試日期(年月日)︰2013/1/9
考試時限(分鐘):180分鐘
是否需發放獎勵金:是
(如未明確表示,則不予發放)
試題 :
1. With an imaginary database that contains only the following 5 document:
D1: Shipment silver damaged truck
D2: Delivery silver arrived silver car
D3: Shipment gold damaged fire
D4: Delivery gold damaged fire
D5: Shipment car arrived truck
(Terms in the stop word list have been grayed out).
Please
(1). Create an inverted file that lists all the terms in alphabetic order and
each cell contains the TF(Term Frequency) weight of each terms in the documents
.(5 points)
(2). Calculate the DF(Document frequency) and IDF(Inverse Document Frequency)
weight for each index term(simply use N/n without logarithm)(5 points).
(3). Give the ranking after the user submits the query "gold silver truck"
(10 points)
(4). After the first iteration, the user examines the results and marks D2, D3
as relevant. Produce the new ranking using Rocchio's method where all the
parameters equal 1(10 points).
(5). With the same query and relevant information, calculate the new query term
weight for "gold", "silver", "truck" according to Robertson and Spark Jones
weighting method (hint: first you need to decide the value for "N", "R", "n",
"r").(10 points)
2. Unlike data retrieval where perfect precision and recall are guaranteed,
information retrieval is more of a probabilistic process where information
conveyed in the retrieved documents might or might not answer user's
information needs. What are the possible causes behind the uncertainty of IR
(10 points)?
3. Define the following concepts and explain how they are related to one
another: "specificity", "precision" and "IDF (Inverse document Frequency);
"exhaustivity", "recall" and "TF(Term Frequency)".There is often a trade-off
between precision and recall, is there also a trade-off between specificity and
exhaustivity?(20 points)?
4. Explain the three basic models in information retrieve: Boolean, Vector
space and Probabilistic (20 points).
5. Explain the rationale behind PageRank and the meaning of each component of
the formula below (10 points).
PR(A)=(1 - d) + d ΣPR(Ii) / C(Ii)
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.112.25.108
※ 編輯: mandy080413 來自: 140.112.25.108 (01/09 20:30)
→
01/10 15:56, , 1F
01/10 15:56, 1F