| Term 
 
        | Draw the diagram for Query Processing |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | What are the types of query plans and their characteristics? |  | Definition 
 
        | 
Logical query plan
Alegebraic representation of queryOperators taken from relational algebraIs abstract Physical query plan
Algorithms selected for each operator in planExecution order for specific operators |  | 
        |  | 
        
        | Term 
 
        | Please show in the diagra where does query optimization take place |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | What does the 'selecting logical query plan' stage consist of? |  | Definition 
 
        | It takes place after the parse query, before select physical query plan, and it consists of 
Generate Logical Query planRewrite logical query plan |  | 
        |  | 
        
        | Term 
 
        | What are Physical Plan Operators? |  | Definition 
 
        | Algorithm that implements one of the basic relational operations that are used in query plans   e.g. Relational algebra has join operator. How the join is carried out depends on >Structure of relations, > Size of relations, > Presence of indexes and hashes, etc |  | 
        |  | 
        
        | Term 
 
        | What is the computational model for Pyhsical plan operators?   |  | Definition 
 
        | Need to choose good physical plan operators: 
Estimate "cost" of each operatorKey measure of cost is the number of disk accesses (far more costly than main memory accesses) Assumption: arguments of operator are on disk, result is in main memory |  | 
        |  | 
        
        | Term 
 
        | What are the cost parameters of Physical Plan operators? |  | Definition 
 
        | M: Main memory available for buffers S(R): Size of a tuple of relation R B(R): Blocks used to store relation R T(R): Number of tuples in relation R (Cardinality of R) V(R,a): Number of distinct values for attribute a in relation R |  | 
        |  | 
        
        | Term 
 
        | What is a clustered 1) File, 2) relation and 3) index? |  | Definition 
 
        | 
Tuples from different relations that can be joined (on particular attribute values) stored in blocks together.
[R1 R2 S1 S2] [R3 R4 S3 S4] Tuples from relation are stored together in blocks, but not necessarily sorted
[R1 R2 R3 R4] [R5 R6 R7 R8] Index allows tuples to be read in an order that corresponde to physical order |  | 
        |  | 
        
        | Term 
 
        | What does scanning consist of? |  | Definition 
 
        | 
e.g. Read all tuples of relation R (that satisfy some predicate)Two variants |  | 
        |  | 
        
        | Term 
 
        | What are the characteristics of Table scan? |  | Definition 
 
        | 
Tuples arranged in blocks
All blocks known to the systemPossible to get blocks one at a time I/O Cost
B(R) disk accesses, if R is clusteredT(R) disk accesses, if R is not clustered |  | 
        |  | 
        
        | Term 
 
        | What are the characteristics of Index Scan? |  | Definition 
 
        | 
An index exists on some attribute of R
Use index to find all blocks holding RRetreive blocks for R I/O Cost
B(R) + B(IR) disk accesses if clusteredB(R) >> B(IR), so treat as only B(R)T(R) disk accesses if not clustered |  | 
        |  | 
        
        | Term 
 
        | What are the main characteristics of one-pass algorithms? |  | Definition 
 
        | 
Read data from disk only onceTypically require that at least one argument fits in main memoryThree broad categories:
Unary, tuple at a time (i.e. select, project)Unary, full relation (i.e. duplicate elimination, grouping)Binary, full relation |  | 
        |  | 
        
        | Term 
 
        | Describe the one-pass unary, tuple at a time |  | Definition 
 
        | 
Algorithm:
Foreach block of R:
Copy block to input bufferPerform operation (select, project) on each tuple in blockMove selected/projected tuples to output buffer Cost:
In general, B(R) or T(R) disk accesses depending on clusteringIf operator is a select that compares an attribute to a constant and index exists for attributes used in select, <<B(R) disk accesses Requires M >= 1   |  | 
        |  | 
        
        | Term 
 
        | Describe the Unary, Full-relation, one pass algorithm and cost |  | Definition 
 
        | 
Algorithm (Normal):
foreach block of R
Copy block to input bufferupdate accumulatormove tuples to output buffer Algorithm (Duplicate Elimination)
foreach block of R:
copy block to input bufferforeach tuple in block
If tuple is not in accumulatorelse copy to accumulator Requires M >= B(d(R))+1 block of main memory
1 block for input bufferM-1 blocks for accumulator (records each tuple seen so far) Accumulator implemented as in-memory data structure (tree, hash)If fewer than B(d(R)) blocks available, thrashing likelyCost is B(R) disk accesses |  | 
        |  | 
        
        | Term 
 
        | What are the characteristics of grouping in a unary, full relation? |  | Definition 
 
        | Grouping operators: min, max, sum, count, avg 
Accumulator contains per-group valuesOutput contains per-group valuesCost is B(R) disk accesses |  | 
        |  | 
        
        | Term 
 
        | What are the characteristics of the Binary, Full-relation? |  | Definition 
 
        | Union, intersection, difference, product, join   In general, cost is B(R) + B(S). (R, S are operand relations)   Requirement for one pass operation: min(B(R), B(S)) < M-1 |  | 
        |  | 
        
        | Term 
 
        | What are the characteristics of the join, binary ful relation? |  | Definition 
 
        | 
Two relations, R(X,Y) and S(Y,Z), B(S)<B(R)Uses main memory search structure keyed on Y 
foreach block of S:
read blockadd tuples to search structure foreach block of R:
copy block to input bufferforeach tuple in block:
Find matching tuples in search structureConstruct new tuples and copy to output |  | 
        |  | 
        
        | Term 
 
        | What are the characteristics of nested-loop joins? |  | Definition 
 
        | 
Also known as iteration joinAssumming that we're joining the relations R,S on attribute C, the algorithm is as follows:
foreach tuple r in R
foreach tuple s in S
if r.C = s.C then output r,s pair   |  | 
        |  | 
        
        | Term 
 
        | What are some factors that affect cost? |  | Definition 
 
        | 
Tuples of a relation stored physically together (clustered)Relations sorted by join attributeIndexes exist?   |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through a tuple-based nested loop join with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | 
Relations not contiguous
One disk access per tuple Cost for each tuple in R1=cost to read tuple + cost to read R2
Total Cost = T(R1)*(1+T(R2)) =10,000 * (1+5,000) = 50,010,000 disk accesses It is possible to do better by using all available main memory (M=101), read outer relation R1 in chunks of 100 blocks, and read all inner relation2 (using 1 block) + join   |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through a block-based nested loop join with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | 
Cost to read one 100-block chunk of R1 = 100*1/S(R1) = 1,000 disk accessesCost to process each chunk = 1000 + T(R2) = 6,000Total cost = T(R1)/ 1000 * 6000 = 60,000 disk accesses It's possible to do better by reversing the join order ( i.e. R1 becomes the inner relation, R2 becomes the outer relation). |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through a join reordering with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | We are reversing the join order ( i.e. R1 becomes the inner relation, R2 becomes the outer relation). 
Cost to read one 100-block chunk of R2=100*1/S(R2)=1,000 disk accessesCost to process each chunk = 1000+T(R1) = 11,000Total cost = T(R1)/1000*11,000 = 55,000 disk accesses It's possible to do better if the tuples in each relation are contiugous (i.e. clustered) |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through Contiguous Relations with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | Tuples in each relation are contiugous (i.e. clustered) 
B(R1)=1000, B(R2)=500Cost to read one 100-block chunk of R2=100 disk accessesCost to process each chunk = 100+B(R1)=1100Total cost=B(R2)/100*1,100=5,500 disk accesses We can do better if both relations are contiguous and sorted by C, the join attribute |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through a merge join with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | Both relations are contiguous and sorted by C, the join attribute   Total cost = B(R1)+B(R2)=1000+500 = 1,500 disk accesses   Problem is if R1 and R2 aren't sorted by C, as it is needed to sort them. |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through a Two Pass Algorithm with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | If R1 and R2 are not sorted by C, we need to sort them - this can be done with mergesort: 
Foreach 100 block chunk of R
Read chunk sort in memoryWrite to disk Read all chunks + merge + write out   |  | 
        |  | 
        
        | Term 
 
        | What is the cost of the merge sort? |  | Definition 
 
        | Each tuple is read, written, read, written... so: 
Sort cost R1: 4x1000 = 4,000 disk accessesSort cost R2: 4x500 = 2,000 disk accesses   |  | 
        |  | 
        
        | Term 
 
        | 
Solve a join between relations R1, R2 on attribute C through a Merge Join with sort with the following elements:
T(R1) = 10,000T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101 blocks |  | Definition 
 
        | 
R1,R2 contiguous but usorted:Total Cost = sort cost + join cost= 6,000 + 1,500= 7,500 disk accesses 
Nested loop cost = 5,500 disk accesses
Merge join does not necessarily pay off 
 If R1=10,000blocks contiugous and R2=5,000blocks not ordered   Nested loop cost = 505,000 disk accesses Merge join cost = 75,000 disk accesses   In this case merge join with sort is better       |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Are trees that represent queries visually by breaking them down into components:   SELECT PNUMBER,DNUM,LNAME,ADDRESS,DATE FROM PROJECT,DEPARTMENT, EMPLOYEE WHERE DNUM=DNUMBER AND MGRSSN=SSN AND PLOCATION='Stafford'   [image] |  | 
        |  | 
        
        | Term 
 
        | What are the optimization steps? |  | Definition 
 
        | Heuristic approach 
Start with canonical formMove o operators down the treeReorder subtrees to put most restrictive o firstCombine x and o to form |><|Move pi operators down the tree |  | 
        |  | 
        
        | Term 
 
        | Show the diagram for the Query plan, and show where to choose heuristics |  | Definition 
 | 
        |  |