Thursday, October 30, 2008

Reading for Lecture Oct 30

Chord: A Scalable Peer-to-peer Lookup Service for Internet Application

Chord provides support for just one operation: given a key, it maps the key onto a node. Chord uses a variant of consistent hashing to assign keys to Chord nodes. This tends to balance load, since each receives roughly the same number of keys. The major features of Chord are simplicity, provable correctness and provable performance.

Chord node need routing information about O(log N) nodes and a node resolves the hash function by communicating with other nodes. It resolves all lookups via O(log N) messages to other nodes. Chord maintains its information as nodes join and leave the system; with high probability each such event results in no more than O(log^2N) message.

The consistent hash function assigns each node and keys an m-bit identifier using a base hash function such as SHA-1. A node’s identifier is chosen by hashing the node’s IP address, while a key identifier is produced by hashing the key. Identifiers are ordered in an identifier circle modulo 2^m. The key k is assigned to the first node whose identifier is equal to or follows k in the identifier space. If identifiers are represented as a circle of numbers from 0 to 2^m – 1, then successor(k) is the first node clockwise from k.

Each node n maintains a routing table with at most m entries called the finger table. The i^th entry in the table at node n contains the identity of node s = successor(n + 2^i) where 1 <= I <= m. Thus each node only has to store information about a small number of nodes and know more about nodes closely follow it on the identifier circle then about nodes farther away. When a node n does not know the successor of a key k, it can find a node whose ID is closer than its own to k. That node will know more about the identifier circle in the region of k than n does.

I do not really understand the “with high probability” phrase.

Looking Up Data in P2P Systems

This paper describes the data looking-up mechanism in P2P systems. The mechanism needs to be scalable, and distributable.

In tree-like routing, each node records, for each prefix, the location of some node with that prefix. Pastry, an instance of the algorithm, gives each node a randomly chosen ID indicating its position on an identifier circle. It routes messages with a key to the live node with a node ID numerically closest to the key.

Each node n maintains a leaf set L, which is the set of |L|/2 nodes closest to n and larger than n, and the set of |L|/2 nodes closest to n and smaller than n. The correctness of this leaf set is the only requirement for correctness; forwarding is always correct, unless |L|/2 nodes with adjacent IDs fail simultaneously.

When a node n’s sought key is in the leaf set, the query is forwarded to that node. If it is not there, it tries to forward the query to the node in the leaf set that have the same closest prefix to the key. This continues until the query reach the destination.

No comments: