Group Meeting Minutes

 

Date: Aug. 4th, 2005

Place: SB220

Attendees: Wai Gen, Linh, Dongmei

 

²       Linh¡¯s presentation on the paper Sloppy Hashing and Self-Organizing Clusters. M. J. Freedman and D. Mazieres. In the Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS03), Berkeley, CA, 2003.

Ø        This paper presented two mechanisms applied to their Coral project for achieving scalability and high performance.

o       Distributed sloppy hash table (DSHT).

o       Decentralized clustering algorithm.

Ø        Two goals: avoid hot spots; find nearby data w/o querying distant nodes.

Ø        DSHT is specifically suited for locating replicated resources.

o       Sloppy insert: replica pointer appended to a ¡°full¡± node spills over to the previous node in the lookup path

o       Sloppy retrieve: returns some randomized subset of the pointers stored under a given key.

Ø        Each Coral node is a member of several DSHTs (clusters) of increasing diameter which is the max desired round-trip time (RTT) between any two nodes it contains). Also, each node has the same ID in all clusters.

Ø        Hierarchical Lookup:

o       Coral has 3 levels of clusters. A node belong to one cluster at each level:

¡ì         Level 2: many fast clusters with regional coverage (RTT~30 msec).

¡ì         Level 1: multiple clusters with continental coverage(RTT~100 msec)

¡ì         Level 0: one planet-wide cluster (RTT = inf).

o       Each cluster has a unique ID.

o       Coral uses this hierarchy for distance-optimized lookup. Details on inserting key/value pair and retrieving key are in section 2.2.

Ø        Cluster joining:

o       A node will only join an acceptable cluster (one in which the latency to 90% of the nodes is below the cluster¡¯s diameter); a node unable to find an acceptable cluster creates a new one with a random CID.

o       A node can join a better cluster whenever it learns of one.

Ø        Cluster merging: when a node knows of two acceptable clusters at a given level, it will join (switch to) the larger one.

Ø        Cluster spitting:

o       A cluster may need to split in order to remain acceptable to its nodes.

o       Coral specifies a node c within CID as a cluster center; all nodes near c join one cluster; all other nodes join a second cluster.

²       Dongmei¡¯s presentation on Kullback-Leibler divergence:

Ø        K-L divergence treated as a ¡°distance¡± between two distributions p and q.

Ø        General Concept:

Ø        In IR, K-L is a widely used distance metric to measure how well one probability distribution predict another, especially, to measure how well a topic model for topic T predicts a query Q.

o       A topic model for a topic T is a probability distribution {p1, p2, ¡­ pn} over a vocabulary set {w1, w2, ¡­wn}, where pi is the frequency with which word wi is used in the text of T.

o      
 

 


where f(Q, wi) is the number of occurrences of wi in Q;

 

 


                            where F(D, wi): number of occurrences of wi in D.

o       The smaller the value, the better the topic model of T predicts Q.

²       Linh explored query relaxation and presented experiment results for:

Ø        Early stop masking: stop querying once some results got returned.

Ø        Max succeeding subquery (xss) masking: used on failed queries Q which returned nothing; Try to compose Q¡¯ by using the terms of Q to return some results and any super queries of Q¡¯ return nothing.

²       Linh:

Ø        Try to explain and analyze the query masking experiment results.

Ø        Start the preparation for the qualifiers.

Ø        Make a consistant version of the p2psim scource code.

²       Dongmei:

Ø        Pick a paper to present for next week.

Text-Based Content Search and Retrieval in ad hoc P2P Communities. Francisco Matias Cuenca-Acuna and T. D. Nguyen, Department of Computer Science, Rutgers University.  In proceedings of the International Workshop on Peer-to-Peer Computing, May 2002.

http://www.panic-lab.rutgers.edu/Research/planetp/Publications/content_search_on_P2P_02(FINAL).pdf

Ø        Start the preparation for the qualifiers.

Ø        Ideas on journalizing the rare paper.

Ø        Work on limewire prototype.