Term
What are some considerations when placing records in blocks? |
|
Definition
- Separating Records
- Spanned vs. Unspanned
- Sequencing
- Indirection
|
|
|
Term
What are some approaches to separating records in a block? |
|
Definition
Three approaches:
- Use fixed length records - no need to separate
- Use a special marker to indicate record end
- Give record lengths (or offsets)
- Within each record
- In block header
|
|
|
Term
What are some points of Spanned vs. Unspanned blocks? |
|
Definition
- Unspanned: Each record must fit within a single block
- Simpler but may waste space
- Problem if record size > block size
- Spanned: Records may be split between blocks
- If record doesn't fit, it needs a pointer to where the rest of the data is
- The rest of the data needs a pointer to where the beginning of the data is
|
|
|
Term
What are some key points of squencing? |
|
Definition
- Ordering records in file (and block) by some key value
- This makes it possible to efficiently read records in order (e.g. to do a merge-join)
- Options for sequencing are:
- Next record physically contiguous
- Linked records (not physically contiguous)
- Overflow area (Records in sequence)
|
|
|
Term
What are some characteristics of indeirection? |
|
Definition
- The ways to refer to records can be:
- Physical addressing
- Indirect addressing
- Other options in between
- Tradeoff between:
- Flexibility (easier to move records on insertion/deletion)
- Cost (of maintaining indirection)
|
|
|
Term
Give the specification for Physical Addressing |
|
Definition
|
|
Term
Give an example of indirect addressing |
|
Definition
Record ID is arbitrary bit string
[image] |
|
|
Term
How is indirection implemented in block? |
|
Definition
- Records can be shifted within block without changing RID
- Acces to a given RID is fast - only a single block access needed
[image] |
|
|
Term
What are some characteristics of Block headers? |
|
Definition
- Data at beginning of block that describes block, and may contain
- File ID (or RELATION or DB ID)
- This block ID
- Pointer to free space
- Type of block (e.g. contains recs type 4; is overflow,...)
- Pointer to other blocks "like it"
- Timestamp
|
|
|
Term
What are some key points of Insertion? |
|
Definition
- When records are not in sequence (Easy case)
- Insert new record at end of file or in deleted slot
- If records are variable size it is not as easy
- Records in squence (Hard)
- If there's free space "close by", not too bad...
- Or use overflow idea...
- Considerations:
- How much free space to leave in each block,track, cylinder?
- How often do I recognize file+overflow?
|
|
|
Term
What are some main points for deletion? |
|
Definition
- How to deal with dangling pointers
- Tombstones (Leave "MARK" in map or old location, or logical IDs)
- Two main options:
- Immediately reclaim space
- Mark space as deleted
- May need chain of deleted records (for re-use)
- Need a way to mark deleted records:
- special characters
- delete field
- in map
|
|
|
Term
What are the details of Single Buffering? |
|
Definition
- File consisting of blocks B1, B2,...
- Program that processes the file block-by-block
- Read B1 -> Buffer
- Process Data in buffer
- Read B2 -> Buffer
- Process Data in Buffer
- etc...
Single Buffer Cost: n(P+R)
P = Time to process a block
R = Time to read a block
n = Number of blocks
(Recall Double Buffer Time = R + nP) |
|
|
Term
What are the key concepts of double buffering? |
|
Definition
Use a pair of buffers
- While reading a block and writing into buffer A
- Process block previously read into buffer B
- After block read into A, process A and read next block into B
Double Buffer Time = R + nP
P = Time to process a block
R = Time to read a block
n = Number of blocks
(Recall Single Buffer time = n(R+P)) |
|
|
Term
What are some key concepts of address management? |
|
Definition
- Every block and record has two addresses
- A database address (when in secondary storage)
- A memory address (when copied into main memory)
- When in main memory, using memory address (=pointers) is more efficient
- Otherwise, translation table is required
- Converts database address into current memory address
|
|
|
Term
What are some main concepts of pointer swizzling? |
|
Definition
- General term for techniques used to translate database address to virtual memory address space
- Swizzled pointers consist of
- One bit to indicate whether the pointer is a database address or a memory address
- Database or memory pointer, as appropriate
|
|
|
Term
Give a visual example of swizling pointers from database to virtual memory address space |
|
Definition
|
|
Term
What are some Swizzling Strategies? |
|
Definition
- Automatic
- As soon as block brought into memory, locate all pointers and addresses and enter them into translation table
- Replace pointers in blocks with new entries
- On demand
- Leave all pointers unswizzled whenblock in brought into memory
- Swizzle pointers only when dereferenced
- No swizzling
- Use translation table to map pointers on each dereference
|
|
|
Term
What are some key concepts of swizzling? |
|
Definition
- Reverse of the swizzling operation
- When a block is written back to disk, rewrite swizzled pointers using the translate table
- Need to beware of pinned blocks (that cannot yet be safely written to disk)
|
|
|
Term
Give an example of row column store |
|
Definition
Row store:
----------------------------------------------------
| id1 || cust1 || prod1 || date1 |
----------------------------------------------------
| id2 || cust2 || prod2 || date2 |
----------------------------------------------------
Column store:
--------- --------------
| id1 || || cust1 ||
----------- --------------
| id2 || || cust2 ||
--------- --------------
--------- --------------
| id1 || || prod1 ||
----------- --------------
| id2 || || prod2 ||
--------- --------------
--------- --------------
| id1 || || date1 ||
----------- --------------
| id2 || || date2 ||
--------- -------------- |
|
|
Term
What are the advantages of row and column store |
|
Definition
- Advantages of Column Store:
- More efficient compact storage (Fields need not start at byte boundaries)
- Efficient reads on data mining operations
- Advantages of Row Store:
- Writes (Multiple fields of one record) more efficient
- Efficient reads for record access (OLTP)
|
|
|