Shared Flashcard Set

Details

Title

COMP3017 - Lecture 3 (B) - Access Structures

Description

COMP3017 - Lecture 3 (B) - Access Structures

Total Cards

Subject

Computer Science

Level

Undergraduate 2

Created

05/01/2014

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

Give an example of a B+tree

Definition

[image]

Term

What is a B+tree?

Definition

An ordered binary tree where the leafs can have more than one node/value.

Root node typically kept in memory, as it is the entrance point to the index

Term

What are some characteristics of leaf nodes?

Definition

If index is a primary index

Leaf nodes are records contiaining data, stored in the order of the primary key

The index provides an alternative to a sequential scan

If the index is a secondary index

Leaf nodes contain pointers to the data records

Data can be accessed in teh sequence of the secondary key

A secondary index can point to any sort of data file, for example one created by hashing

Term

What are characteristics of nodes?

Definition

Each node is of fixed size and contains n keys with n+1 pointers

Minimum nodes: Don't want nodes to be too empty (efficient use of space)

Non-Leaf: [(n+1)/2] pointers

Leaf: [(n+1)/2] pointers

Term

What are the rules for B+tree rules?

Definition

All leaves same distance from root (balanced tree)

Pointers in leaves point to records except for "sequence pointer"

Number of pointers/keys for B+tree of order n:[image]

Term

Give an example of B+tree arithmetic

Definition

Parameters:

Block size 8kb, of which b=8000 bytes available for storage of records

key length k=10 bytes

record lengths r=100 bytes (including key)

key pointer p=6 bytes

A leaf node in a primary index can accomodate lp records, where lp=floor(b/r) = 80 records

A leaf node in a secondary index can accomodate ls records where ls=floor((b-p)/(k+p)) = 499 records

A non-leaf node could accomodate i entries, where i=floor((b-p)/(k+p)=499 records

To allow for expansion, assume initial node occupancy of u, where u=0.6

Term

Give an example of b+tree primary index arithmetic

Definition

Parameters:

Block size 8kb, of which b=8000 bytes available for storage of records

key length k=10 bytes

record lengths r=100 bytes (including key)

key pointer p=6 bytes

For primary index(the leaf nodes hold the records)

A non-leaf node initially points to i*u = 299 blocks

Each leaf initially contains lp*u = 48 records

1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records

2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 4,291,248 records

Term

Give an example of arithmetic for B+tree secondary index

Definition

Parameters:

Block size 8kb, of which b=8000 bytes available for storage of records

key length k=10 bytes

record lengths r=100 bytes (including key)

key pointer p=6 bytes

For primary index(the leaf nodes hold the records)

A non-leaf node initially points to i*u = 299 blocks

Each leaf initially contains lp*u = 299 records

1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records

2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 26,730,899 records

Term

What should be considered in B+tree insertion?

Definition

Space available in leaf

Leaf overflow

Non-leaf overflow

New root

Term

What should be considered for B+Tree deletion?

Definition

Simple case

Re-distribute keys

Coalesce with sibling (Often not implemented due to complexity)

Term

What is the comparison of B-trees vs. Static indexed sequential files?

Definition

Btrees consume more space

Blocks are not contiguous

Fewer disk accesses for static indexes, even allowing for reorganization

Concurrency control is harder in B-trees

However, DBA does not know when to reorganize or how full to load pages of new index

Term

What are the main concepts of hashing?

Definition

Main memory hash table

Hash function h() takes a key and computes an integer value

Value is used to select a bucket from a bucket array

Bucket array contains linked lists of records

Secondary storage hash table

Stores many more records than a main memory hash table

Bucket array consists of disk blocks

Term

What are some hashing approaches?

Definition

Hash function calculates block pointer directly or as an offset from first block

Requires blocks to be in fixed, consecutive locations

Hash function calculates offset in aray of block pointers (directory)

Used for "secondary" search keys

Term

Give an example of a hash function

Definition

Key='x1 x2 ... xn' (n byte character string), b buckets

h: add x1+x2+ ... xn and compute sum modulo b

Not a particularly good function

Good hash function has the same expected number of keys per bucket for each bucket

Term

Should keys be kept sorted? (In hashing)

Definition

Buckets should be kept sorted if CPU time is critical and inserts/deletes are relative infrequent

Term

What is the utilisation rule of thumb in hashing?

Definition

Space utilisation should be between 50% and 80%

Utilisation = #keys used / total #keys that fit

If < 50% wasting space

If > 80% overflows significant

Depends on how good hash function is and on #keys/bucket

Term

How is it possible to cope with growth in hashing?

Definition

Overflows and reorganizations

Dynamic hashing

Extensible

Linear

Term

What is extensible hashing?

Definition

Combines two ideas:

Use i of b bits output by hash function where i grows over time

Use directory

[image]

Term

What are the pros and cons of extensible hashing?

Definition

Pros:

Can handle growing files

Wish less wasted space

With no full reorganizations

Cons:

Indirection

Not bad if directory in memory

Directory doubles in size

Now it fits in memory, now it doesn't

Sudden increase in disk accesses

Term

What are the main concepts of linear hashing?

Definition

Another dynamic hashing scheme

Keeps track of utilisation. U=#used slots / total #slots

if U>threshold, then increase m (and maybe i)

Combines two ideas:

Use i least significant bits of hash, where i grows over time

Hash file grows incrementally and linearly (unlike extensible hash file, which periodically increases)

Term

What are the pros and cons of linear hashing?

Definition

Pros:

Can handle growing files

With less wasted space

With no full reorganizations

No indication like extensible hashing

Cons:

Can still have overflow chains

Term

What are some key points of index vs hashing?

Definition

Hashing good for probes given a key:

SELECT {...} FROM R WHERE R.A=5

Indexing (Including B-trees) good for range searches):

SELECT FROM R WHERE R.A > 5

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile