Shared Flashcard Set

Details

Title

COMP3017 - Lecture 2 (C) - Data

Description

COMP3017 - Lecture 2 (C) - Data

Total Cards

Subject

Computer Science

Level

Undergraduate 2

Created

05/01/2014

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

What are some considerations when placing records in blocks?

Definition

Separating Records

Spanned vs. Unspanned

Sequencing

Indirection

Term

What are some approaches to separating records in a block?

Definition

Three approaches:

Use fixed length records - no need to separate

Use a special marker to indicate record end

Give record lengths (or offsets)

Within each record

In block header

Term

What are some points of Spanned vs. Unspanned blocks?

Definition

Unspanned: Each record must fit within a single block

Simpler but may waste space

Problem if record size > block size

Spanned: Records may be split between blocks

If record doesn't fit, it needs a pointer to where the rest of the data is

The rest of the data needs a pointer to where the beginning of the data is

Term

What are some key points of squencing?

Definition

Ordering records in file (and block) by some key value

This makes it possible to efficiently read records in order (e.g. to do a merge-join)

Options for sequencing are:

Next record physically contiguous

Linked records (not physically contiguous)

Overflow area (Records in sequence)

Term

What are some characteristics of indeirection?

Definition

The ways to refer to records can be:

Physical addressing

Indirect addressing

Other options in between

Tradeoff between:

Flexibility (easier to move records on insertion/deletion)

Cost (of maintaining indirection)

Term

Give the specification for Physical Addressing

Definition

[image]

Term

Give an example of indirect addressing

Definition

Record ID is arbitrary bit string

[image]

Term

How is indirection implemented in block?

Definition

Records can be shifted within block without changing RID

Acces to a given RID is fast - only a single block access needed

[image]

Term

What are some characteristics of Block headers?

Definition

Data at beginning of block that describes block, and may contain

File ID (or RELATION or DB ID)

This block ID

Pointer to free space

Type of block (e.g. contains recs type 4; is overflow,...)

Pointer to other blocks "like it"

Timestamp

Term

What are some key points of Insertion?

Definition

When records are not in sequence (Easy case)

Insert new record at end of file or in deleted slot

If records are variable size it is not as easy

Records in squence (Hard)

If there's free space "close by", not too bad...

Or use overflow idea...

Considerations:

How much free space to leave in each block,track, cylinder?

How often do I recognize file+overflow?

Term

What are some main points for deletion?

Definition

How to deal with dangling pointers

Tombstones (Leave "MARK" in map or old location, or logical IDs)

Two main options:

Immediately reclaim space

Mark space as deleted

May need chain of deleted records (for re-use)

Need a way to mark deleted records:

special characters

delete field

in map

Term

What are the details of Single Buffering?

Definition

File consisting of blocks B1, B2,...

Program that processes the file block-by-block

Read B1 -> Buffer

Process Data in buffer

Read B2 -> Buffer

Process Data in Buffer

etc...

Single Buffer Cost: n(P+R)

P = Time to process a block

R = Time to read a block

n = Number of blocks

(Recall Double Buffer Time = R + nP)

Term

What are the key concepts of double buffering?

Definition

Use a pair of buffers

While reading a block and writing into buffer A

Process block previously read into buffer B

After block read into A, process A and read next block into B

Double Buffer Time = R + nP

P = Time to process a block

R = Time to read a block

n = Number of blocks

(Recall Single Buffer time = n(R+P))

Term

What are some key concepts of address management?

Definition

Every block and record has two addresses

A database address (when in secondary storage)

A memory address (when copied into main memory)

When in main memory, using memory address (=pointers) is more efficient

Otherwise, translation table is required

Converts database address into current memory address

Term

What are some main concepts of pointer swizzling?

Definition

General term for techniques used to translate database address to virtual memory address space

Swizzled pointers consist of

One bit to indicate whether the pointer is a database address or a memory address

Database or memory pointer, as appropriate

Term

Give a visual example of swizling pointers from database to virtual memory address space

Definition

[image]

[image]

[image]

Term

What are some Swizzling Strategies?

Definition

Automatic

As soon as block brought into memory, locate all pointers and addresses and enter them into translation table

Replace pointers in blocks with new entries

On demand

Leave all pointers unswizzled whenblock in brought into memory

Swizzle pointers only when dereferenced

No swizzling

Use translation table to map pointers on each dereference

Term

What are some key concepts of swizzling?

Definition

Reverse of the swizzling operation

When a block is written back to disk, rewrite swizzled pointers using the translate table

Need to beware of pinned blocks (that cannot yet be safely written to disk)

Term

Give an example of row column store

Definition

Row store:

----------------------------------------------------

| id1 || cust1 || prod1 || date1 |

----------------------------------------------------

| id2 || cust2 || prod2 || date2 |

----------------------------------------------------

Column store:

---------   --------------

| id1   ||   || cust1 ||

----------- --------------

| id2   ||   || cust2 ||

---------   --------------

---------   --------------

| id1   ||   || prod1 ||

----------- --------------

| id2   ||   || prod2 ||

---------   --------------

---------   --------------

| id1   ||   || date1 ||

----------- --------------

| id2   ||   || date2 ||

---------   --------------

Term

What are the advantages of row and column store

Definition

Advantages of Column Store:

More efficient compact storage (Fields need not start at byte boundaries)

Efficient reads on data mining operations

Advantages of Row Store:

Writes (Multiple fields of one record) more efficient

Efficient reads for record access (OLTP)

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile