Term
Why are decentralised P2P systems desirable? |
|
Definition
- To mitigate attacks, legal challenges, government intervention, or the central component simply being turned off
- Reduce large loads on specific download/update servers
- Many hobbyists don't have access to big servers with fat bandwidth
- Exploit millions of devices in home networks
|
|
|
Term
What are the goals of a Distributed Hash Table? |
|
Definition
- A hash table stored across a large set of nodes
- Store K keys across N nodes
- Seek to build a distributed P2P index where
- Each node keeps a small amount of information about other nodes
- Intentionally do not have whole index at all nodes
- Nodes can look up entries in the index quickly
- Each node can use the index at the same time, and as nodes join and leave
- This allows performance to grow with the number of nodes
- The goal of the DHT is to be able to find the node responsible for holding data about given key
- Data is location of the content peer(s), as per a tracker
|
|
|
Term
What are some considerations of Distributed Hash Tables (DHT)? |
|
Definition
- Scalable - Potentially millions of nodes
- Robust to faults and attacks - implies some replication within the system
- Allow nodes to come and go freely - Without refreshing the keys
- Every node can be used to find the answer (Key, value) - want to be able to start a query at any node
- The (key, value) is not placed on an arbitrary node, but a node identified by a hash of the key
- The hash funtion should have uniform distribution
- Don't want many keys in one node and none in other
|
|
|
Term
What are some DHT solutions available? |
|
Definition
- Chord
- CAN
- Pastry
- Tapestry
|
|
|
Term
|
Definition
- Can provide a way for client to find a key in a collection of distributed nodes
- And then the client can start talking to those peer(s) to fetch/offer the content (chunks) it is interested in
- Stores the peer as the value in a key value pair, and store that key at some node in the DHT
|
|
|
Term
What are the principles of Chord? |
|
Definition
- At any time, a DHT instance will have N active nodes
- Each node runs on an IP address/port
- Can be IPv4 or IPv6 addresses
- Each node stores part of the overall index
- For N active nodes and K keys, each node is on average for K/N keys
- Chord uses an m-bit number space for identifiers and keys
- SHA-1 is typically the hash function (so generates 160-bit hash)
- Node IP address (and port) hashed to the node identifier
- In practice identifier is the hash modulo m
- m should be large enough to make collisions improbable
- Key (perhaps a torrent file) hashed to the key identifier
|
|
|
Term
How are keys mappend to nodes in Chord? |
|
Definition
- Chord maps the node identifier space to a circle with 2^m potential node locations
- There can thus be up to 2^m active nodes in the system
- This also means that node 0 is adjacent to node (2^m)-1
- Chord maps keys to the same circular space
- uses the same hash function (modulo m)
- Each node has a predecessor (Anti-clockwise) and a successor (clockwise) in the form of a node identifier and an IP address/port
- Key k allocated to first active node with an equal or higher identifier
- The node n that becomes responsible for k is thus at position successor(k) in the ring
- That node may hav ea bucket(list) with multiple keys
- Not every potential node location is active
|
|
|
Term
Give an abstract explanaiton of Chord |
|
Definition
- You want to find the value held by a given key k
- Your client can query at least one node in the DHT
- The value is held by the node n whose identifier is successor(k)
- But you don't know the IP/port of n to connect directly to it
- If all nodes held the locations of all other nodes, it would be easy
- But there is a big cost of doing that, keeping it up to date and consistent
- Instead, each active node keeps just the locations (IP/port) of a small number of other nodes, and you query through the nodes to resolve the lookup
- Nodes know of predecessor, sucessor and finger table entries
|
|
|
Term
Give an example of Chord successor |
|
Definition
|
|
Term
|
Definition
- Join function assumes an existing Chord ring is in place
- A small number of co-operating nodes can initiate a ring, with successor and predecessor information
- A node wishing to join must know about at least one node in the system
- If node n joins, it can ask any node for successor(n)
- Can then notify successor(n) that it is its new predecessor and successor(n)'s former predecessor becomes the predecessor of n
- Then the relevant key information needs to be transferred
|
|
|
Term
Show a diagram for transferring keys when joining a new node |
|
Definition
|
|
Term
What happens when a node leaves? |
|
Definition
- All keys held by the node need to be passed to its successor
- If node 10 leaves, keys that hash to identifiers 9 and 10 need to be managed by node 14
- No problem if node leaves gracefully
- Messages can be exchanged to ensure keys are passed
- On average move K/N keys
- If it does not there is a problem
- Each node maintains a successor list of its r nearest successors (it also keeps a list of r predecessors)
- So for example if its direct successor leaves abruptly it can contact the next sucessor in the list
- Can verify its list by asking its successor for its list
|
|
|
Term
What does a Chord finger table consist of? |
|
Definition
- Makes DHT search more efficient
- Form of 'routing' through identifier space
- Not required, but improves efficiency significant
- A node's finger table has m entries, indexed 0...m-1
- Each entry identifies another active node
- Each entry has two fields
- To set the value for entry i in the table for node n:
- start = (n + 2^i) modulo 2^m
- Then entry i = successor(start(i))
- The entry holds both the identifier and IP/port of the node
|
|
|
Term
How is the finger table used? |
|
Definition
- Looking up the location of a key k starting at node n
- Node examines route to the request to the closest predecessor of k
- If key identifier k is between n and successor(n)
- Then key is held by successor(n)
- Otherwise examine the finger table to find the entry to advance in search towards k
- Request is then forwarded to that node
- Essentially the search moves towards the target, with the finger table typically allowing a node to foward its query half way along the remaining distance
- Average number of lookups is log2(N)
- Improves on average of N/2 for a simple search
|
|
|
Term
Show an example of a Chord finger table lookup |
|
Definition
|
|
Term
What are some characteristics of Chord stabilisation |
|
Definition
- It is important that successor information is up to date
- Each node periodically runs a process to verify its successor data and its finger table
- e.g. a node n will ask its successor what it believes its predecessor p to be; if a different value, then p becomes a candidate for n's new successor
- This also allows n's successor to update its predecessor to be n if necessary, if n's successor no longer has a predecessor p
- The stabilisation messages run independently of any key lookup queries between nodes
|
|
|
Term
Why avoid .torrent files and use Magnet links? |
|
Definition
- Torrent files specify content and where tracker is
- Magnet links
- Hash code of torrent
- Unique for a given torrent (and thus the content it points to)
- DHT used to find values that are IP of peers
- Pirate bay default for a while now
- Can find further peers with Peer Exchange (PEX)
- Nodes ask peers to report their peers
- Link is simply a URI including a hash
- Hash is entry point to a DHT
|
|
|
Term
What are the principles of GNutella? |
|
Definition
- Other P2P approach
- Forms dynamic overlay network
- Each network is aware of and talks to a set of other nodes
- Nodes may come and go
- Used message flooding
|
|
|
Term
What are the principles of GNutella flooding? |
|
Definition
- Original GNutella floods to find its content
- Node queries all active neighbours in the overlay to find content corresponding to particular key
- If neighbour has answer, responds
- Otherwise, neighbour forwards to its neighbours
- There is a TTL on the queries
- Decremented each time query is forwarded
- Generates a lot of messages and many hops
|
|
|
Term
What does GNutella flooding consist of? |
|
Definition
- Sends ripple of queries through the overlay
- Maximum TTL typically 7
- Nodes must node which queries they have seen to avoid loops
- Initial neighbours bootstrapped from a well known list
- Node builds a list of clients (to max size) from quering other clients it knows
- Concept fo 'ultra peers' was added to broaden reach
|
|
|
Term
How are GNutella Query responses handled? |
|
Definition
- Traverse back to originator
- Back propagation process
- Response goes in reverse path
- Can provide some level of anonimity
|
|
|