Term
If we have a personal database, with EMPLOYEE table and attributes Dept, Salary.
How can we find employees who work in the sales department and have salaries greater than 40,000? |
|
Definition
- Approach 1
- Get all matching records using an index on one attribute
- Check values of other attribute on those records
- Approach 2
- Use secondary indexes on each attribute to get two sets of record pointers
- Take intersection of sets
- Approach 3
- Use secondary inde on one attribute to select suitable index on other attribute
- Get all matching recrods using selected index
|
|
|
Term
What are the main concepts of a Grid File? |
|
Definition
- Partition multi-dimensional space with a grid
- Grid lines partition space into stripes
- Intersections of stripes from different dimensions define regions
- Each region associated with a pointer to a bucket of record pointers
- Attribute for values for record determine region and therefore bucket
- Fixed number of regions - overflow blocks used to increase bucket size as necessary
- Can index grid on values ranges
|
|
|
Term
What are the Pros and Cons of Grid Files? |
|
Definition
- Pros
- Good for multiple-key search
- Supports partial-match, range and nearest-neighbour queries
- Cons
- Space, management overhead (nothing is free)
- Need partitioning ranges that evenly split keys
|
|
|
Term
What are the main concepts of partitioned hash? |
|
Definition
- Hash functions takes a list of attribute values as arguments
- Bits of hash value are divided between attributes
- Effectively a hash function per attribute
|
|
|
Term
What are the pros and cons of partitioned hash? |
|
Definition
- Pros
- Good hash function will evenly distribute records between buckets
- Supports partial-match queries
- Cons
- No good for nearest-neighbour or range queries
|
|
|
Term
What are the main characteristics of a kd-tree? |
|
Definition
- Multidimentional binary search tree
- Each node splits the k-dimensional space along a hyperplane
- Nodes contain
- An attribute-value pair
- A pair of pointers
- All nodes at the same level discriminate for the same attribute
- Levels rotate between attributes of all dimensions
- Partial-match queries
- If we know value of attribute, we can choose which branch to explore
- If we don't know value of attribute, must explore both branches
|
|
|
Term
Give an example of partitioned hash |
|
Definition
|
|
Term
Give an example of a kd-tree |
|
Definition
|
|
Term
How is it possible to adapt kd-trees to secondary storage? |
|
Definition
- Average path length form root to leaf: log2(n)
- Disk accesses should be kept as few as possible
- Two approaches
- Multiway nodes (split values into n ranges)
- Group nodes in blocks (node plus descendants to a given ply)
|
|
|
Term
What are Region Quad trees? |
|
Definition
- Each partition divides the space into four equal sub-regions
- Split regions if they contain records that will fit into a block
- Operations similar to those kd-trees
|
|
|
Term
Give an example of a region quad-tree |
|
Definition
|
|
Term
What are the main characteristics of Point Quad-Tree? |
|
Definition
- Partitions are not equal area
- Split lines centred on data points
- ne/nw/se/sw sub-regions
- Otherwise, equivalent region quad-tree
|
|
|
Term
Give an example of a Point Quad-tree |
|
Definition
|
|
Term
What are the main characteristics of R-Trees? |
|
Definition
- Used to represent data that consists of k-dimensional data regions
- International nodes of tree represent regions that contain data regions
- Regions typically defined as top-right, bottom-left coordinates
|
|
|
Term
Give an example of an R-tree |
|
Definition
|
|
Term
What are the main characteristics of an UB-tree? |
|
Definition
Basic approach
- Map n-dimensional sparce space onto a 1-dimensional line using a fractal space-filling curve
- Partion ranges and index using a B+tree
- When querying, identify regions of n-d space (= segments 1-d line) that intersects with query triangle)
|
|
|
Term
What are the characteristics of the Z-index in a UB-tree? |
|
Definition
- Map domain of each attribute onto n-bit integer
- Order of points on Z-curve given by bit interleaving the portions of the axes
- x = x1x2
- y = y1y2
- z-index = y1x2y2x2
|
|
|
Term
What does the z-region partition consist of? |
|
Definition
- Z-curve partitioned into contiguous ranges (z-regions)
- Note that these may not be contiguous regions in the multidimensional space
- Z-regions mapped to leaf nodes of a B+tree
- A leaf node contains pointers to records whose attribute value locate them within the associated z-region
|
|
|
Term
How is it possible to query UB-trees? |
|
Definition
- Multidimensional range query can be considered as a k-dimensional rectangle
- Algorithm identifies z-regions that intersect wth the query rectangle
|
|
|
Term
What are the key characteristics fo Bitmap Indexes? |
|
Definition
- Collection of bit-vectors used to index an attribute
- One bit-vector for each unique attribute value
- Once bit for each record
- Querying index involves combining bit-vectors with bitwise operators (&,|)
- A 1 in the ith position indicates that record i is a match
|
|
|
Term
Give the Bitmap Index Table for the following situation:
An online homeware vendor sells products p1... p10
- Products p3 and p5 cost £100 pounds
- products p1 costs £200
- products p2,p7 and p10 costs £300
- Products p4, p6, p8 and p9 cost £400
- Products p1, p4, p5 and p9 are designed for lounges
- Products p5 and p7 are designed for dinning rooms
- Products p3, p5, p6 and p10 are designed for kitchens
|
|
Definition
|
|
Term
What are the compression characteristics of Bitmap Indexes? |
|
Definition
- Bit vactors are typcially sparse, with few 1 bits
- Large amounts of wasted space
- Run-length encoding of bit-vectors to reduce stored size
- Bitwise operators must be applied to original bit-vectors
- Can decode RLE bit-vectors one run at a time
|
|
|
Term
What are the pros and cons of Bitmap indexes? |
|
Definition
- Pros
- Efficient answering of partial-match queries
- Cons
- Requires fixed record numbers
- Changes to data file require changes to bitmap index
|
|
|