Shared Flashcard Set

Details

Title

Comp2005 - Lecture 10 - Distributed Hash Tables (DHT)

Description

Alejandro Saucedo - Comp2005 Lecture 10 FlashCard Set

Total Cards

Subject

Computer Science

Level

Undergraduate 2

Created

05/18/2013

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

Why are decentralised P2P systems desirable?

Definition

To mitigate attacks, legal challenges, government intervention, or the central component simply being turned off

Reduce large loads on specific download/update servers

Many hobbyists don't have access to big servers with fat bandwidth

Exploit millions of devices in home networks

Term

What are the goals of a Distributed Hash Table?

Definition

A hash table stored across a large set of nodes

Store K keys across N nodes

Seek to build a distributed P2P index where

Each node keeps a small amount of information about other nodes

Intentionally do not have whole index at all nodes

Nodes can look up entries in the index quickly

Each node can use the index at the same time, and as nodes join and leave

This allows performance to grow with the number of nodes

The goal of the DHT is to be able to find the node responsible for holding data about given key

Data is location of the content peer(s), as per a tracker

Term

What are some considerations of Distributed Hash Tables (DHT)?

Definition

Scalable - Potentially millions of nodes

Robust to faults and attacks - implies some replication within the system

Allow nodes to come and go freely - Without refreshing the keys

Every node can be used to find the answer (Key, value) - want to be able to start a query at any node

The (key, value) is not placed on an arbitrary node, but a node identified by a hash of the key

The hash funtion should have uniform distribution

Don't want many keys in one node and none in other

Term

What are some DHT solutions available?

Definition

Chord

CAN

Pastry

Tapestry

Term

What does Chord provide?

Definition

Can provide a way for client to find a key in a collection of distributed nodes

And then the client can start talking to those peer(s) to fetch/offer the content (chunks) it is interested in

Stores the peer as the value in a key value pair, and store that key at some node in the DHT

Term

What are the principles of Chord?

Definition

At any time, a DHT instance will have N active nodes

Each node runs on an IP address/port

Can be IPv4 or IPv6 addresses

Each node stores part of the overall index

For N active nodes and K keys, each node is on average for K/N keys

Chord uses an m-bit number space for identifiers and keys

SHA-1 is typically the hash function (so generates 160-bit hash)

Node IP address (and port) hashed to the node identifier

In practice identifier is the hash modulo m

m should be large enough to make collisions improbable

Key (perhaps a torrent file) hashed to the key identifier

Term

How are keys mappend to nodes in Chord?

Definition

Chord maps the node identifier space to a circle with 2^m potential node locations

There can thus be up to 2^m active nodes in the system

This also means that node 0 is adjacent to node (2^m)-1

Chord maps keys to the same circular space

uses the same hash function (modulo m)

Each node has a predecessor (Anti-clockwise) and a successor (clockwise) in the form of a node identifier and an IP address/port

Key k allocated to first active node with an equal or higher identifier

The node n that becomes responsible for k is thus at position successor(k) in the ring

That node may hav ea bucket(list) with multiple keys

Not every potential node location is active

Term

Give an abstract explanaiton of Chord

Definition

You want to find the value held by a given key k

Your client can query at least one node in the DHT

The value is held by the node n whose identifier is successor(k)

But you don't know the IP/port of n to connect directly to it

If all nodes held the locations of all other nodes, it would be easy

But there is a big cost of doing that, keeping it up to date and consistent

Instead, each active node keeps just the locations (IP/port) of a small number of other nodes, and you query through the nodes to resolve the lookup

Nodes know of predecessor, sucessor and finger table entries

Term

Give an example of Chord successor

Definition

[image]

Term

What is bootstraping?

Definition

Join function assumes an existing Chord ring is in place

A small number of co-operating nodes can initiate a ring, with successor and predecessor information

A node wishing to join must know about at least one node in the system

If node n joins, it can ask any node for successor(n)

Can then notify successor(n) that it is its new predecessor and successor(n)'s former predecessor becomes the predecessor of n

Then the relevant key information needs to be transferred

Term

Show a diagram for transferring keys when joining a new node

Definition

[image]

Term

What happens when a node leaves?

Definition

All keys held by the node need to be passed to its successor

If node 10 leaves, keys that hash to identifiers 9 and 10 need to be managed by node 14

No problem if node leaves gracefully

Messages can be exchanged to ensure keys are passed

On average move K/N keys

If it does not there is a problem

Each node maintains a successor list of its r nearest successors (it also keeps a list of r predecessors)

So for example if its direct successor leaves abruptly it can contact the next sucessor in the list

Can verify its list by asking its successor for its list

Term

What does a Chord finger table consist of?

Definition

Makes DHT search more efficient

Form of 'routing' through identifier space

Not required, but improves efficiency significant

A node's finger table has m entries, indexed 0...m-1

Each entry identifies another active node

Each entry has two fields

Node n and successor(n)

To set the value for entry i in the table for node n:

start = (n + 2^i) modulo 2^m

Then entry i = successor(start(i))

The entry holds both the identifier and IP/port of the node

Term

How is the finger table used?

Definition

Looking up the location of a key k starting at node n

Node examines route to the request to the closest predecessor of k

If key identifier k is between n and successor(n)

Then key is held by successor(n)

Otherwise examine the finger table to find the entry to advance in search towards k

Request is then forwarded to that node

Essentially the search moves towards the target, with the finger table typically allowing a node to foward its query half way along the remaining distance

Average number of lookups is log2(N)

Improves on average of N/2 for a simple search

Term

Show an example of a Chord finger table lookup

Definition

[image]

Term

What are some characteristics of Chord stabilisation

Definition

It is important that successor information is up to date

Each node periodically runs a process to verify its successor data and its finger table

e.g. a node n will ask its successor what it believes its predecessor p to be; if a different value, then p becomes a candidate for n's new successor

This also allows n's successor to update its predecessor to be n if necessary, if n's successor no longer has a predecessor p

The stabilisation messages run independently of any key lookup queries between nodes

Term

Why avoid .torrent files and use Magnet links?

Definition

Torrent files specify content and where tracker is

Magnet links

Hash code of torrent

Unique for a given torrent (and thus the content it points to)

DHT used to find values that are IP of peers

Pirate bay default for a while now

Can find further peers with Peer Exchange (PEX)

Nodes ask peers to report their peers

Link is simply a URI including a hash

Hash is entry point to a DHT

Term

What are the principles of GNutella?

Definition

Other P2P approach

Forms dynamic overlay network

Each network is aware of and talks to a set of other nodes

Nodes may come and go

Used message flooding

Term

What are the principles of GNutella flooding?

Definition

Original GNutella floods to find its content

Node queries all active neighbours in the overlay to find content corresponding to particular key

If neighbour has answer, responds

Otherwise, neighbour forwards to its neighbours

There is a TTL on the queries

Decremented each time query is forwarded

Generates a lot of messages and many hops

Term

What does GNutella flooding consist of?

Definition

Sends ripple of queries through the overlay

Maximum TTL typically 7

Nodes must node which queries they have seen to avoid loops

Initial neighbours bootstrapped from a well known list

Node builds a list of clients (to max size) from quering other clients it knows

Concept fo 'ultra peers' was added to broaden reach

Term

How are GNutella Query responses handled?

Definition

Traverse back to originator

Back propagation process

Response goes in reverse path

Can provide some level of anonimity

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile