Term
What does the impedance mismatch consist of? |
|
Definition
- To store data persistently in modern programs:
- A single logical structure
- Must be split up (i.e. normalized)
- Object orientation
- Based on software engineering principles
- Mapping from one world to the other has problems
|
|
|
Term
|
Definition
- Data size
- Can be dealt by building bigger databases
- Expensive + physically limited
- Dealt by building clusters of smaller machines
- Increased data connectivity
|
|
|
Term
What are the problems with relational databases? |
|
Definition
- RDBMS have fundamental issues in dealing with horizontal scale
- Designed to work on single, large machines
- Difficult to distribute effectively
- Mode subtle: an impedance mismatch
- Create logical structures in memory and then rip them appart to stick them in RDBMS
- RDBMS datamodel often disjoint from its intended use (normalization not always good)
- Unconftable to program with (joins and ORM, etc)
|
|
|
Term
What are the key attributes of NoSQL? |
|
Definition
- Non-relational (thought they can be, but aren't good at it)
- Schema free (except the implicit schema, application side)
- Inherently distributed (In different ways, some more than others)
- Open source (Mostly. eg. Oracle's NoSQL)
|
|
|
Term
What are some key value basics? |
|
Definition
- "A hashtable with persistance"
- Use a key, ask a database for a value
- Key is usually a string
- Value can be anything (DB often unaware of value content)
- Examples:
- Riak
- Buckets/keys/values/links
- Query with key, process with map-reduce
- Secondary indexes(metadata)
- Redis
- More understanding of value types
- In memory (very fast)
|
|
|
Term
What are the basics for document databases? |
|
Definition
- Database as storage of a mass of different documents
- Is a complex data structure
- Can only contain data from other docuemnts
- Document Data Stores understand their documents
- Queries can run against values of document fields
- Indexes can be constructed for document fields
- Batch style (mapreduce etc.) often supported
- Examples
- MongoDB
- Master/slave design
- .find() queries like ORM
- geo-spatial indexing
- CouchDB
- Master/master
- Only map reduce queries
- Favours availability to consistency
|
|
|
Term
What are the basics of column databases? |
|
Definition
- Entries held in rows
- Tables define a set of "column families"
- Rows contain 0 or more columns for each column family
- No schema (Columns in a family per row)
- On querying:
- Key lookup is fast
- Batch processing via mapreduce (OLAP lives here)
[image]
Examples:
- Primarily for batch processing
- HBase
- Uses HDFS for storage, hadoop for processing
- Built to trasure consistency over avilability
- Cassandra
- Supports key ranges
- Works over a variety of processing architectures (Hadoop, storm, etc)
|
|
|
Term
What are some characteristics of aggregates? |
|
Definition
- These DBs represent a variety of "aggregate" databases
- Columns, Key-values, documents
- They store some form of self contained thing that is useful in issolation
- A document in MongoDB
- The column families in Hbase
- The values in Riak
- Many leverage this for scale
- It completely side steps the sharding issues in RDBMS
|
|
|
Term
What are some basics on Graph DBs? |
|
Definition
- Focus on modelling the data's structure
- Graphs are composed of Vertices and Edges
- Queried with graph traversal API (Cypher, SPARQL)
- Can be much faster at querying graph like data structures
- Like friends of friends or web links
- Examples
- Neo4j
- Not distributed
- ACID transactions
- Cypher for query
- 4Store (5Store, other triple stores
- RDF and Semantic Web technologies
- 5store supports 1000s of machines easily
- SPARQL for query
|
|
|
Term
Give a definition of ACID |
|
Definition
- Atomic - Entire transaction succeeds or the entire transaction rolls back
- Consistent - A transaction must leave the database "valid"
- Isolated - Concurrent transactions behave as though they occurred serially
- Durable - Once committed, transactions survive power loss, etc
|
|
|
Term
|
Definition
- Consistent - Writes are atomic, all subsequent requests retrive the new value
- Available - the database will always return a value as long as the server is running
- Partition tolerant - The system will still function even if the cluster network is partitioned (i.e. the cluster looses contact with parts of itself)
|
|
|
Term
What is an alternative of ACID? |
|
Definition
- If we want CAP P, ACID can be restrictive, it is possible to use BASE
- Basic Availability - The application works basically all the time
- Soft-state - does not hav eto be consistent all the time
- Eventual consistency - But will be in some known state eventually
|
|
|
Term
What is statend by eventual consistency? |
|
Definition
- A work around of CAP
- From Amazon's Dynamo paper: "The storage system guarantees that if no new updates are made by the object, eventually all accesses will return the last updated value"
|
|
|
Term
Explain Multi Version Concurrency Control in Eventual Consitency (MVCC) |
|
Definition
- Some document DBs use multi-version concurrency control (MVCC)
- Like a version control system
- Writes without locks
- Multiple versions of documents
- Distributed versions on different machines
- Collisions detected during replication
- App developer can be informed/decide on make collisions
- Used by CouchDB
|
|
|
Term
Explain Vector Clocks in Eventual Consitency (MVCC) |
|
Definition
- An extension of Lamport timestamps
- They help understand the order of events in a distributed system
- Vector clocks can be used to
- Identify the provenance of an Item of data
- Decide order in which data was changed
- Help resolve conflicts
- Flag inconsitencies for app specific decisions
- Used by amazon's Dynamo and Riak
|
|
|