Shared Flashcard Set

Details

Title

COMP3017 - Lecture 5 - Query Processing

Description

COMP3017 - Lecture 5 - Query Processing

Total Cards

Subject

Computer Science

Level

Undergraduate 3

Created

05/02/2014

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

Draw the diagram for Query Processing

Definition

[image]

Term

What are the types of query plans and their characteristics?

Definition

Logical query plan

Alegebraic representation of query

Operators taken from relational algebra

Is abstract

Physical query plan

Algorithms selected for each operator in plan

Execution order for specific operators

Term

Please show in the diagra where does query optimization take place

Definition

[image]

Term

What does the 'selecting logical query plan' stage consist of?

Definition

It takes place after the parse query, before select physical query plan, and it consists of

Generate Logical Query plan

Rewrite logical query plan

Term

What are Physical Plan Operators?

Definition

Algorithm that implements one of the basic relational operations that are used in query plans

e.g. Relational algebra has join operator. How the join is carried out depends on >Structure of relations, > Size of relations, > Presence of indexes and hashes, etc

Term

What is the computational model for Pyhsical plan operators?

Definition

Need to choose good physical plan operators:

Estimate "cost" of each operator

Key measure of cost is the number of disk accesses (far more costly than main memory accesses)

Assumption: arguments of operator are on disk, result is in main memory

Term

What are the cost parameters of Physical Plan operators?

Definition

M: Main memory available for buffers

S(R): Size of a tuple of relation R

B(R): Blocks used to store relation R

T(R): Number of tuples in relation R (Cardinality of R)

V(R,a): Number of distinct values for attribute a in relation R

Term

What is a clustered 1) File, 2) relation and 3) index?

Definition

Tuples from different relations that can be joined (on particular attribute values) stored in blocks together.

[R1 R2 S1 S2] [R3 R4 S3 S4]

Tuples from relation are stored together in blocks, but not necessarily sorted

[R1 R2 R3 R4] [R5 R6 R7 R8]

Index allows tuples to be read in an order that corresponde to physical order

Term

What does scanning consist of?

Definition

e.g. Read all tuples of relation R (that satisfy some predicate)

Two variants

Table Scan

Index Scan

Term

What are the characteristics of Table scan?

Definition

Tuples arranged in blocks

All blocks known to the system

Possible to get blocks one at a time

I/O Cost

B(R) disk accesses, if R is clustered

T(R) disk accesses, if R is not clustered

Term

What are the characteristics of Index Scan?

Definition

An index exists on some attribute of R

Use index to find all blocks holding R

Retreive blocks for R

I/O Cost

B(R) + B(I_R) disk accesses if clustered

B(R) >> B(I_R), so treat as only B(R)

T(R) disk accesses if not clustered

Term

What are the main characteristics of one-pass algorithms?

Definition

Read data from disk only once

Typically require that at least one argument fits in main memory

Three broad categories:

Unary, tuple at a time (i.e. select, project)

Unary, full relation (i.e. duplicate elimination, grouping)

Binary, full relation

Term

Describe the one-pass unary, tuple at a time

Definition

Algorithm:

Foreach block of R:

Copy block to input buffer

Perform operation (select, project) on each tuple in block

Move selected/projected tuples to output buffer

Cost:

In general, B(R) or T(R) disk accesses depending on clustering

If operator is a select that compares an attribute to a constant and index exists for attributes used in select, <<B(R) disk accesses

Requires M >= 1

Term

Describe the Unary, Full-relation, one pass algorithm and cost

Definition

Algorithm (Normal):

foreach block of R

Copy block to input buffer

update accumulator

move tuples to output buffer

Algorithm (Duplicate Elimination)

foreach block of R:

copy block to input buffer

foreach tuple in block

If tuple is not in accumulator

copy to output buffer

else copy to accumulator

Requires M >= B(d(R))+1 block of main memory

1 block for input buffer

M-1 blocks for accumulator (records each tuple seen so far)

Accumulator implemented as in-memory data structure (tree, hash)

If fewer than B(d(R)) blocks available, thrashing likely

Cost is B(R) disk accesses

Term

What are the characteristics of grouping in a unary, full relation?

Definition

Grouping operators: min, max, sum, count, avg

Accumulator contains per-group values

Output contains per-group values

Cost is B(R) disk accesses

Term

What are the characteristics of the Binary, Full-relation?

Definition

Union, intersection, difference, product, join

In general, cost is B(R) + B(S). (R, S are operand relations)

Requirement for one pass operation:

min(B(R), B(S)) < M-1

Term

What are the characteristics of the join, binary ful relation?

Definition

Two relations, R(X,Y) and S(Y,Z), B(S)<B(R)

Uses main memory search structure keyed on Y

foreach block of S:

read block

add tuples to search structure

foreach block of R:

copy block to input buffer

foreach tuple in block:

Find matching tuples in search structure

Construct new tuples and copy to output

Term

What are the characteristics of nested-loop joins?

Definition

Also known as iteration join

Assumming that we're joining the relations R,S on attribute C, the algorithm is as follows:

foreach tuple r in R

foreach tuple s in S

if r.C = s.C then output r,s pair

Term

What are some factors that affect cost?

Definition

Tuples of a relation stored physically together (clustered)

Relations sorted by join attribute

Indexes exist?

Term

Solve a join between relations R1, R2 on attribute C through a tuple-based nested loop join with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

Relations not contiguous

One disk access per tuple

Cost for each tuple in R1=cost to read tuple + cost to read R2

Total Cost =

T(R1)*(1+T(R2)) =

10,000 * (1+5,000) =

50,010,000 disk accesses

It is possible to do better by using all available main memory (M=101), read outer relation R1 in chunks of 100 blocks, and read all inner relation2 (using 1 block) + join

Term

Solve a join between relations R1, R2 on attribute C through a block-based nested loop join with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

Cost to read one 100-block chunk of R1 = 100*1/S(R1) = 1,000 disk accesses

Cost to process each chunk = 1000 + T(R2) = 6,000

Total cost = T(R1)/ 1000 * 6000 = 60,000 disk accesses

It's possible to do better by reversing the join order ( i.e. R1 becomes the inner relation, R2 becomes the outer relation).

Term

Solve a join between relations R1, R2 on attribute C through a join reordering with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

We are reversing the join order ( i.e. R1 becomes the inner relation, R2 becomes the outer relation).

Cost to read one 100-block chunk of R2=100*1/S(R2)=1,000 disk accesses

Cost to process each chunk = 1000+T(R1) = 11,000

Total cost = T(R1)/1000*11,000 = 55,000 disk accesses

It's possible to do better if the tuples in each relation are contiugous (i.e. clustered)

Term

Solve a join between relations R1, R2 on attribute C through Contiguous Relations with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

Tuples in each relation are contiugous (i.e. clustered)

B(R1)=1000, B(R2)=500

Cost to read one 100-block chunk of R2=100 disk accesses

Cost to process each chunk = 100+B(R1)=1100

Total cost=B(R2)/100*1,100=5,500 disk accesses

We can do better if both relations are contiguous and sorted by C, the join attribute

Term

Solve a join between relations R1, R2 on attribute C through a merge join with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

Both relations are contiguous and sorted by C, the join attribute

Total cost = B(R1)+B(R2)=1000+500

= 1,500 disk accesses

Problem is if R1 and R2 aren't sorted by C, as it is needed to sort them.

Term

Solve a join between relations R1, R2 on attribute C through a Two Pass Algorithm with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

If R1 and R2 are not sorted by C, we need to sort them - this can be done with mergesort:

Foreach 100 block chunk of R

Read chunk

sort in memory

Write to disk

Read all chunks + merge + write out

Term

What is the cost of the merge sort?

Definition

Each tuple is read, written, read, written... so:

Sort cost R1: 4x1000 = 4,000 disk accesses

Sort cost R2: 4x500 = 2,000 disk accesses

Term

Solve a join between relations R1, R2 on attribute C through a Merge Join with sort with the following elements:

T(R1) = 10,000

T(R2) = 5,000

S(R1) = S(R2) = 1/10 block

M = 101 blocks

Definition

R1,R2 contiguous but usorted:

Total Cost = sort cost + join cost

= 6,000 + 1,500

= 7,500 disk accesses

Nested loop cost = 5,500 disk accesses

Merge join does not necessarily pay off

If R1=10,000blocks contiugous

and

R2=5,000blocks not ordered

Nested loop cost = 505,000 disk accesses

Merge join cost = 75,000 disk accesses

In this case merge join with sort is better

Term

What are query trees?

Definition

Are trees that represent queries visually by breaking them down into components:

SELECT PNUMBER,DNUM,LNAME,ADDRESS,DATE FROM PROJECT,DEPARTMENT, EMPLOYEE WHERE DNUM=DNUMBER AND MGRSSN=SSN AND PLOCATION='Stafford'

[image]

Term

What are the optimization steps?

Definition

Heuristic approach

Start with canonical form

Move o operators down the tree

Reorder subtrees to put most restrictive o first

Combine x and o to form |><|

Move pi operators down the tree

Term

Show the diagram for the Query plan, and show where to choose heuristics

Definition

[image]

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile