Term
What are the characteristics of Localisation in Query Processing |
|
Definition
- Fragmentation expressed as relational algebra expressions
- Global relations can be reconstructed using these expression (A localisation program)
- Naively, generate distributed query plan by substituting localisation programs for relations (Use reduction techniques to optimise queries)
|
|
|
Term
What are the characteristics of Reduciton for Horizontal Fragmentation in Query Processing |
|
Definition
- Given a relation R fragmented as FR={R1,R2,...,Rn}
- Localization program is R=R1U R2 U ... U Rn
- Reduce by identifying fragments of localized query that give empty relations
- Two classes to consider:
- Reduction with selection
- Reduciton with join
|
|
|
Term
What are the characteristics of Horizontal Selection Reduction in Query Processing |
|
Definition
Given horizontal fragmentation of R such that
Rj=opj(R): σp(Rj) = ∅ if ∀x∈R, ¬(p(x) ∧ pj(x))
[image] |
|
|
Term
What are the characteristics of Horizontal Join Reduction in Query Processing |
|
Definition
- Recalling that join distributes over unions:
- (R1 ∪ R2) ⨝ S ≣ (R1 ⨝ S) ∪ (R2 ⨝ S)
- Given fragments Ri and Rj defined with predicates pi and pj:
- Ri ⨝ Rj = ∅ if ∀x∈Ri, ∀y∈Rj ¬(pi(x) ∧ pj(y))
[image] |
|
|
Term
What are the characteristics of Reduction for Vertical Fragmentation in Query Processing |
|
Definition
Given a relation R fragmented as FR = {R1, R2, ..., Rn}
Localization program is R = R1 ⨝ R2 ⨝ ... ⨝ Rn
Reduce by identifying useless intermediate relations.
One case to consider is reduction with projection. |
|
|
Term
What are the characteristics of Vertical Projection Reduction in Query Processing |
|
Definition
Given a relation R with attributes A = {a1, a2, ..., an} vertically fragmented as Ri = πAi(R) where Ai ⊆ A
πD,K(Ri) is useless if D ⊈ Ai
[image] |
|
|
Term
Where would we compute the join between relations R and S when these are stored in different sites? |
|
Definition
- It is possible to move one relation to the other site and perform the join there
- CPU cost of performing join same regardless of site
- Communication costs depend on size of hte relation being moved
- CostCOM = size(R) = cardinality(R) * length(R)
- Move the relation with the smallest size to the other's site
|
|
|
Term
What are the characteristics of Semijoins in Query Processing |
|
Definition
R ▷p S ≣ πR(R ⨝p S) ≣ πR(R ⨝p πp(S))
Where πp(S) projects out from S only the attributes used in predicate p
This reduces communication cost by only moving that part of a relation that will be used in the join
Recall: R ▷p S ≣ πR(R ⨝p S) where p is a predicate defined over R and S and πR projects out only those attributes from R Then:
size(R ▷p S) < size(R ⨝p S)
This is because:
R ⨝p S
≣ (R ▷p S) ⨝p S
≣R ⨝p (R ◁p S)
≣ (R ▷p S) ⨝p (R ◁p S) |
|
|
Term
What are the steps to perform Semijoin Reduction with R in site 1 and S in site 2? |
|
Definition
- Site 2 sends πp(S) to site 1
- Site 1 calculates R ▷p S ≣ πR(R ⨝p πp(S))
- Site 1 sends R ▷p S to site 2
- Site 2 calculates R ⨝p S ≣ (R ▷p S) ⨝p S
So Costcom= size(πp(S)) + size(R ▷p S)
This approach is better if:
size(πp(S)) + size(R ▷p S) < size(R) |
|
|
Term
What are the principles of distributed transactions? |
|
Definition
Transaction processing may be spread across several sites in the distributed database
- The site from which the transaction originated is known as the coordinator
- The sites on which the transaction is executed are known as the participants
|
|
|
Term
What are the principles on Distribution and ACID on DDBMSs? |
|
Definition
- Non-distributed databases aim to maintain isolation
- Isolation: a transaction should not make updates externally visible until committed
- Distributed databases use two-phase locking (2PL) to preserve isolation
- 2PL ensures serialisability, the highest isolation level
|
|
|
Term
What does two phase locking consist of? |
|
Definition
- Two phases (Duh)
- Growing phase: obtain locks, access data items
- Shrinking phase: release locks
- Guarantees serializable transactions
|
|
|
Term
Show an image of the growing and shrinking phase of 2pl |
|
Definition
|
|
Term
How is locking controlled in a non-distributed and distributed database |
|
Definition
- Former: Locking is controlled by a lock manager
- Latter: Two main approaches to implementing 2pl:
- Centralized 2PL (C2PL)
- Responsibility for lock management lies with a single site
- Distributed 2PL (D2PL)
- Each site has its own lock manager
|
|
|
Term
What are the main characteristics of Centralised Two-Phase Locking (C2PL)? |
|
Definition
- Coordinating sire runs transaction manager TM
- Participant sites run data processors DP
- Lock manager LM runs on central site
- TM requests locks from LM
- If granted, TM submits operations to processors DP
- When DP finish, TM sends message to LM to release locks
|
|
|
Term
Show an image of how two-phase locking works |
|
Definition
|
|
Term
What are the disadvantages of Centralized two-phase locking? |
|
Definition
LM is a single point of failure (less reliable)
LM is a bottleneck (Affects transaction throughput) |
|
|
Term
How does Distributed Two-Phase Locking work? |
|
Definition
- Coordinating site C runs TM
- Each participant runs both an LM and a DP
- TM sends operations and lock requests to each LM
- If lock can be granted, LM forwards operation to local DP
- DP sends "end of operation" to TM
- TM sends message to LM to release locks
- Variant:
- DPs may send "end of operation" to their own LM
- LM releases lock and informs TM
|
|
|
Term
Show an image of how Distributed Two-phase locking works |
|
Definition
|
|
Term
What are the characteristics and conditions for deadlock? |
|
Definition
- Exists when two or more transactions are waiting for each other to release a lock on an item
- Three conditions must be satisfied for a deadlock to occur:
- Concurrency: Two transactions claim exlusive control of one resource
- Hold: one transaction continues to hold exclusively controlled resources until its need is satisfied
- Wait: transactions wait in queues for additional resources while holding resources already allocated
|
|
|
Term
What is a wait-for graph? |
|
Definition
- Representation for interactions between transactions
- Directed graph containing:
- A vertex for each transaction that is currently executing
- An edge from T1 to T2 if T1 is waiting to lock an item that is currenly locked by T2
- Deadlock exists ifff the WFG contains a cycle. Ex:
T1 -> T2 -> T3 -> T1 |
|
|
Term
What are the types of Wait For graph? |
|
Definition
- Local: One per site, only considers transactions on that site
- Global: union of all LWFGs
Deadlock may occur on a single site (within LWFG) or between sites (within GWFG) |
|
|
Term
How is it possible to manage distributed deadlocks? |
|
Definition
- Prevention (Pre-declaration, etc)
- Avoidance (Resource Ordering, Transaction Prioritization)
- Detection and Resolution
|
|
|
Term
What are the characteristics of prevention for DDeadlocks? |
|
Definition
Guarantees that deadlocks cannot occur in the first place
- Transaction pre-declares all data items that it will access
- TM checks that locking data items will not cause deadlock
- Proceed to lock only if all data items are available (unlocked)
It is difficult to know in advance which data items will be accessed by a transaction |
|
|
Term
What are the characteristics of avoidance for DDeadlocks? |
|
Definition
- Two main sub-approaches
- Resource ordering (Concurrency controlled such that deadlocks won't happen)
- All resources (data items) are ordered
- Transactions always access resources in order
- Transaction prioritisation (Potential deadlocks detected and avoided)
- Each transaction has timestamp that corresponds to the time it started: ts(T)
- Transactions are prioritised using these timestamps
- When lock request is denied, use priorities to choose a transaction to abort (WAIT-DIE and WOUND-WAIT rules)
|
|
|