Term
What is the meaning of Parallelism and parallel databases? |
|
Definition
- An arrangement or state that permits several operations or tasks to be performed simultaneously rather than consecutively
- Parallel databases
- Have the ability to split
- Processing data
- Access to data
- Accross multiple processors
|
|
|
Term
What are some benefits of using parallel databases? |
|
Definition
- Hardware trends
- Reduced elapsed time for queries
- Increased transaction throughput
- Incresed scalability
- Better price/performance
- Improved application availability
- Access to more data
- In short, for better performance
|
|
|
Term
What are the characteristics of software shared memory? |
|
Definition
- Tightly coupled
- Symmetric Multiprocessor (SMP)
- Less complex database software
- Limited scalability
- Single buffer
- Single database storage
|
|
|
Term
What are the characteristics of shared disk architecture? |
|
Definition
- Loosely coupled
- Distributed memory
- Avoids memory bottleneck
- Same page may be in more than one buffer at once - can lead to incoherence
- Needs global locking mechanism
- Single logical database storage
- Each processor has its own database buffer
|
|
|
Term
What are the characteristics of shared-nothing architecture? |
|
Definition
- Massively parallel
- Loosely coupled
- High speed interconnect (between processors)
- One page is only local buffer - no buffer incoherence
- Needs distributed deadlock detection
- Needs multiphase commit protocol
- Needs to break SQL requests into multiple sub-requests
- Each processor has its own database buffer
- Each processor owns part of the data
|
|
|
Term
What are the comparisons between hardware and Software architecture? |
|
Definition
- It is possible to use one software strategy on a different hardware arrangement
- Also possible to simulate one hardware configuration on another
- Virtual shared disk (VSD) makes an IBM SP shared nothing system look like a shared disk setup (for oracle)
- From this point on, we deal only with shared nothing software mode
|
|
|
Term
What are some challenges of the shared nothing architecture? |
|
Definition
- Partitioning the data
- Keeping the partitioned data balanced
- Splitting up the queries to get the work done
- Avoiding distributed deadlock
- Concurrency control
- Dealing with node failure
|
|
|
Term
What are some characteristics of transaction parallelism? |
|
Definition
- Improves throughput
- Inter-transaction
- Run many transactions in parallel, sharing the DB
- Often referred to as Concurrency
|
|
|
Term
What are the characteristics of query parallelism |
|
Definition
- Improves response times (lower latency)
- Intra operator (horizontal) parallelism
- Operators decomposed into independent operator instances, which perform the same operation on different subsets of data
- Inter-operator (vertical) parallelism
- Operations are overlapped
- Pipeline data from one stage to the next without materialisation
|
|
|
Term
What are the benefits of function shipping? |
|
Definition
- Database operations are performed where the data is, as far as possible
- Network traffic is minimized
- For basic database operators, code developed for serial implementations can be reused
- In practice, mixture of function shipping and IO shipping has to be employed
|
|
|
Term
What are some characteristics of basic operators? |
|
Definition
- The DBMS has a set of basic operations which it can perform
- The result of each operation is another table
- Scan a table - sequentially or selectively via an index - to produce another table
- Join two table together
- Sort a table
- Perform aggretation functions (sum, average, count)
- Those are combied to execute a query
|
|
|
Term
What are some characteristics of the Volcano Architecture? |
|
Definition
- The volcano query processing system was devised by Goetze Graefe (Oregon)
- Introduced the exchange operator
- Inserted between the steps of a query to:
- Pipeline results
- Direct streams of data to the next step(s), redistributing as necessary
- Provides mechanism to support both vertical and horizontal prallelism
- Informix 'Dynamic Scalable Architecture' based on volcano
|
|
|
Term
What are some parallel queries? |
|
Definition
- Enquiry (i.e. how many customer live in UK)
- Collocated Join (Which customers placed order in july?)
- Directed Join (Which customer placed orders in July? - Tables have differnt keys)
- Broadcast Join ( Which customers and suppliers are in the same country?)
- Repartitioned Join (Which customers and suppliers are in the same country?)
|
|
|
Term
Enquiry (i.e. how many customer live in UK) |
|
Definition
|
|
Term
- Collocated Join (Which customers placed order in july?)
|
|
Definition
|
|
Term
Directed Join (Which customer placed orders in July? - Tables have differnt keys) |
|
Definition
|
|
Term
Broadcast Join (Which customers and suppliers are in the same country?) |
|
Definition
|
|
Term
Repartitioned/Parallel-Hash Join (Which customers and suppliers are in the same country?) |
|
Definition
|
|
Term
What are some characteristics on Query handling? |
|
Definition
- Critical aspect of Parallel DBMS
- User wants to issue simple SQL, and not be concerned with parallel aspects
- DBMS needs structured and facilitates to parallelise queries
- Good optimizer technology is essential
- As database grows, effects (good or bad) of optimiser become increasingly significant
- A single transaction may update data in several different places
- Multiple transactions may be using the same distributed tables simultaneously
- One or several nodes could fail
- Requires concurrency control and recovery across multiple nodes for:
- Locking and deadlock detection
- Two-phase commit to ensure 'all or nothing'
|
|
|
Term
What are some characteristics of locking and deadlocks? |
|
Definition
- With shared nothing architecture, each node is responsible for locking its own data.
- No global locking mechanism
- However
- T1 locks item A on Node 1 and wants item B on Node 2
- T2 locks item B on Node 2 and wants item A on Node 1
- = Distributed Deadlock
|
|
|
Term
What are some approaches to resolving deadlocks? |
|
Definition
- One approach: Timeouts
- Timeout T2, after wait exceeds a certain interval
- Interval may need random element to avoid 'chatter' (i.e. both transactions give up at the same time and then try again
- Rollback T2 to let T1 to proceed
- Restart T2, which can now complete
|
|
|
Term
What are some other approaches to resolving deadlocks? |
|
Definition
- More sophisticated approach (DB2)
- Each node maintains a local 'wait-for' graph
- Distributed deadlock detector (DDD) runs at the catalogue node for each database
- Periodically all nodes send their graphs to the DDD
- DDD records all locks found in wait state
- Transaction becomes a candidate for termination if found in same lock wait state on two successive iterations
|
|
|
Term
What are some characteristics of reliability? |
|
Definition
- We wish to preserve the ACID properties for parallelised transactions
- Isolation is taken care of by 2PL protocol
- Isolation implies consistency
- Durability can be taken care of node-by-node, with proper logging and recobery routines
- Atomicity is the hard part. We need to commit all parts of a transaction or abort all parts
- Two-phase commit protocol (2PC) is used to ensure that atomicity is preserved
|
|
|
Term
What are the main characteristits of Two-phase commit (2PC)? |
|
Definition
- Distinguish between:
- The global transaction
- The local transactions into which the global transaction is decomposed
- Global transaction is managed by a single site, known as the coordinator
- Local transactions may be executed on separate sites, known as the participants
|
|
|
Term
What are the steps of 2 Phase Commits (2PC)? |
|
Definition
- Phase 1: Voting
- Coordinator sends "prepare T" message to all participants
- Participants repond with either "Vote-commit T" or "Vote-abort T"
- Coordinator waits for participants to resond within a timeout period
- Phase 2: Decision
- If all participants return "vote-commit T" (to commit), send "Commit T" to all participants. Wait for acknowledgements within timeout period.
- If any participants returns "vote-abort T", send "abort T" to all participants. Wait for acknowledgements within timeout period
- When all acknowledgements received, transaction is completed
- If a site does not acknowledge, resend global decision until it is acknowledged
|
|
|
Term
What are some characteristics of parallel utilities? |
|
Definition
- Ancillary operations can also exploit parallel hardware
- Parallel data loading/import/export
- Parallel index creation
- Parallel Rebalancing
- Parallel Backup
- Parallel Recovery
|
|
|