Term
Give an example of a B+tree |
|
Definition
|
|
Term
|
Definition
- An ordered binary tree where the leafs can have more than one node/value.
- Root node typically kept in memory, as it is the entrance point to the index
|
|
|
Term
What are some characteristics of leaf nodes? |
|
Definition
- If index is a primary index
- Leaf nodes are records contiaining data, stored in the order of the primary key
- The index provides an alternative to a sequential scan
- If the index is a secondary index
- Leaf nodes contain pointers to the data records
- Data can be accessed in teh sequence of the secondary key
- A secondary index can point to any sort of data file, for example one created by hashing
|
|
|
Term
What are characteristics of nodes? |
|
Definition
- Each node is of fixed size and contains n keys with n+1 pointers
- Minimum nodes: Don't want nodes to be too empty (efficient use of space)
- Non-Leaf: [(n+1)/2] pointers
- Leaf: [(n+1)/2] pointers
|
|
|
Term
What are the rules for B+tree rules? |
|
Definition
- All leaves same distance from root (balanced tree)
- Pointers in leaves point to records except for "sequence pointer"
- Number of pointers/keys for B+tree of order n:[image]
|
|
|
Term
Give an example of B+tree arithmetic |
|
Definition
- Parameters:
- Block size 8kb, of which b=8000 bytes available for storage of records
- key length k=10 bytes
- record lengths r=100 bytes (including key)
- key pointer p=6 bytes
- A leaf node in a primary index can accomodate lp records, where lp=floor(b/r) = 80 records
- A leaf node in a secondary index can accomodate ls records where ls=floor((b-p)/(k+p)) = 499 records
- A non-leaf node could accomodate i entries, where i=floor((b-p)/(k+p)=499 records
To allow for expansion, assume initial node occupancy of u, where u=0.6
|
|
|
Term
Give an example of b+tree primary index arithmetic |
|
Definition
- Parameters:
- Block size 8kb, of which b=8000 bytes available for storage of records
- key length k=10 bytes
- record lengths r=100 bytes (including key)
- key pointer p=6 bytes
- For primary index(the leaf nodes hold the records)
- A non-leaf node initially points to i*u = 299 blocks
- Each leaf initially contains lp*u = 48 records
- 1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records
- 2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 4,291,248 records
|
|
|
Term
Give an example of arithmetic for B+tree secondary index |
|
Definition
- Parameters:
- Block size 8kb, of which b=8000 bytes available for storage of records
- key length k=10 bytes
- record lengths r=100 bytes (including key)
- key pointer p=6 bytes
- For primary index(the leaf nodes hold the records)
- A non-leaf node initially points to i*u = 299 blocks
- Each leaf initially contains lp*u = 299 records
- 1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records
- 2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 26,730,899 records
|
|
|
Term
What should be considered in B+tree insertion? |
|
Definition
- Space available in leaf
- Leaf overflow
- Non-leaf overflow
- New root
|
|
|
Term
What should be considered for B+Tree deletion? |
|
Definition
- Simple case
- Re-distribute keys
- Coalesce with sibling (Often not implemented due to complexity)
|
|
|
Term
What is the comparison of B-trees vs. Static indexed sequential files? |
|
Definition
- Btrees consume more space
- Blocks are not contiguous
- Fewer disk accesses for static indexes, even allowing for reorganization
- Concurrency control is harder in B-trees
- However, DBA does not know when to reorganize or how full to load pages of new index
|
|
|
Term
What are the main concepts of hashing? |
|
Definition
- Main memory hash table
- Hash function h() takes a key and computes an integer value
- Value is used to select a bucket from a bucket array
- Bucket array contains linked lists of records
- Secondary storage hash table
- Stores many more records than a main memory hash table
- Bucket array consists of disk blocks
|
|
|
Term
What are some hashing approaches? |
|
Definition
- Hash function calculates block pointer directly or as an offset from first block
- Requires blocks to be in fixed, consecutive locations
- Hash function calculates offset in aray of block pointers (directory)
- Used for "secondary" search keys
|
|
|
Term
Give an example of a hash function |
|
Definition
- Key='x1 x2 ... xn' (n byte character string), b buckets
- h: add x1+x2+ ... xn and compute sum modulo b
- Not a particularly good function
- Good hash function has the same expected number of keys per bucket for each bucket
|
|
|
Term
Should keys be kept sorted? (In hashing) |
|
Definition
Buckets should be kept sorted if CPU time is critical and inserts/deletes are relative infrequent |
|
|
Term
What is the utilisation rule of thumb in hashing? |
|
Definition
- Space utilisation should be between 50% and 80%
- Utilisation = #keys used / total #keys that fit
- If < 50% wasting space
- If > 80% overflows significant
- Depends on how good hash function is and on #keys/bucket
|
|
|
Term
How is it possible to cope with growth in hashing? |
|
Definition
- Overflows and reorganizations
- Dynamic hashing
|
|
|
Term
What is extensible hashing? |
|
Definition
Combines two ideas:
- Use i of b bits output by hash function where i grows over time
- Use directory
[image] |
|
|
Term
What are the pros and cons of extensible hashing? |
|
Definition
- Pros:
- Can handle growing files
- Wish less wasted space
- With no full reorganizations
- Cons:
- Indirection
- Not bad if directory in memory
- Directory doubles in size
- Now it fits in memory, now it doesn't
- Sudden increase in disk accesses
|
|
|
Term
What are the main concepts of linear hashing? |
|
Definition
- Another dynamic hashing scheme
- Keeps track of utilisation. U=#used slots / total #slots
- if U>threshold, then increase m (and maybe i)
- Combines two ideas:
- Use i least significant bits of hash, where i grows over time
- Hash file grows incrementally and linearly (unlike extensible hash file, which periodically increases)
|
|
|
Term
What are the pros and cons of linear hashing? |
|
Definition
- Pros:
- Can handle growing files
- With less wasted space
- With no full reorganizations
- No indication like extensible hashing
- Cons:
- Can still have overflow chains
|
|
|
Term
What are some key points of index vs hashing? |
|
Definition
- Hashing good for probes given a key:
- SELECT {...} FROM R WHERE R.A=5
- Indexing (Including B-trees) good for range searches):
- SELECT FROM R WHERE R.A > 5
|
|
|