Term
What is the key to determining the effectiveness of an IR system? |
|
Definition
Relevance
- Relevance (generally) a subjective notion
- Relevance is defined in terms of a user's information need
- Users may differn in tehir assessment of relevance of a document to a particular query
|
|
|
Term
Give a diagram of relevance vs retreived |
|
Definition
|
|
Term
Which are two common statistics to determine effectiveness? |
|
Definition
- Precission: What fraction of the returned results are relevant to the information we need?
- Recall: what fraction of the relevant documents in the collection were returned by the system?
|
|
|
Term
Provide a table describing precision and recall |
|
Definition
|
Retrieved
|
NotRetrieved
|
|
Relevant
|
R∩A
|
A
|
R
|
NotRelevant
|
A
|
A
|
R
|
|
A
|
A
|
|
Precision = |R ∩ A| / |A| = P(relevant|retreived)
Recall = |R ∩ A| / |R| = P(retreived|relevant)
[image] |
|
|
Term
Provide the precision, recall, fallout and specifity of the following table:
|
Retrieved
|
Not Retrieved
|
Relevant
|
30
|
20
|
Not Relevant
|
10
|
40
|
|
|
Definition
Collection of 100 documents:
- Precision = 30 / (30+10) = 75%
- Recall = 30 / (30+20) = 60%
- Fallout = 10 / (10+40) = 20%
- Specificity = 40 / (10 + 40) = 80%
|
|
|
Term
What are some characteristics of precision vs Recall |
|
Definition
- Can trade off precision against recall
- A system that returns every document in the collection as its result set guarantees recall 100% but has low precision
- Returning a single document could give low recall but precision 100%
[image]
[image]
|
|
|
Term
Give an example of precision versus recall |
|
Definition
- An information retreival system contains the following 10 documents that are relevant to query q
- d1, d2, d3, d4, d5,
- d6, d7, d8, d9, d10
- In response to q, the system returns the following ranked answer set containing fifteen documents (bold are relevant):
- d3, d11, d1, d23, d34,
- d4, d27, d29, d82, d5,
- d12, d77, d56, d79, d9
- Calculate curve at eleven standard recall levels: 0%, 10%, ... , 100%
- Interpolate precision values as follows:
- P(rj) = max(r<rj) P(r)
- Where rj is the jth standard recall level
- If only d3 is returned, then
- Precision = 1/1 = 100%
- Recall = 1/10 = 10%
- If d3,d11,d1 are returned
- Precision = 2/3 = 67%
- Recall = 2/10 = 20%
- If d3,d11,d1,d23,d34,d4 are returned
- Precision 3/6 = 50%
- Recall = 3/10 = 30%
|
|
|
Term
What are the principles of accuracy? |
|
Definition
- Treats IR system as a two-class classifier (relevant/non-relevant)
- Measures fraction of classification that is correct
- Not a good measure - most documents in an IR system will be irrelevant to a given query (skew)
- Can maximise accuracy by considering all documents to be irrelevant
|
|
|
Term
What are the principles of F-measure? |
|
Definition
- Weighted harmonic mean of precision and recall
[image]
|
|
|
Term
What is a receiver operating characteristic? |
|
Definition
- Plot curve with:
- True positive rate on y axis (Recall or Sensitivity)
- False positive rate on x axis (1 - specificity)
[image]
|
|
|
Term
Which are some other effectiveness measurements? |
|
Definition
- Reciprocal rank of first relevant result:
- If the first result is relevant, it gets a score of 1
- If the first two are not relevant but the third is, it gets a score of 1/3
- Time to answer measures how long it takes a user to find the desired answer to a problem
|
|
|
Term
What are the typical document collections? |
|
Definition
- 1E6 documents; 2-3GB of text
- Lexicon of 500,000 words after stemming and case folding; can be stored in ~10MB
- Inverted index is ~300MB (but can be reduced to >100MB with compression)
|
|
|
Term
What are some characteristics of Web Scale? |
|
Definition
- Web search engines have to work at scales over four orders of magnitude larger (1.1E10+documents in 2005)
- Index divided in k segments on different computers
- Query sent to computers in parallel
- k result sets are merged into single set shown to user
- Thousands of queries per second requires n copies of k computers
- Web search engines don't have complete coverage (Google was best at 8E9 documents in 2005)
|
|
|
Term
What are some characteristics of extended ranking? |
|
Definition
- So far we've considered the role that the document content plays in ranking
- Term weighting using TF-IDF
- Can also use document context - hypertext connectivity
|
|
|