Group Meeting Minutes
Date: Aug. 4th, 2005
Place: SB220
Attendees: Wai Gen, Linh, Dongmei
² Linh¡¯s presentation on the paper Sloppy Hashing and Self-Organizing Clusters. M. J. Freedman and D. Mazieres. In the Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS03), Berkeley, CA, 2003.
Ø This paper presented two mechanisms applied to their Coral project for achieving scalability and high performance.
o Distributed sloppy hash table (DSHT).
o Decentralized clustering algorithm.
Ø Two goals: avoid hot spots; find nearby data w/o querying distant nodes.
Ø DSHT is specifically suited for locating replicated resources.
o Sloppy insert: replica pointer appended to a ¡°full¡± node spills over to the previous node in the lookup path
o Sloppy retrieve: returns some randomized subset of the pointers stored under a given key.
Ø Each Coral node is a member of several DSHTs (clusters) of increasing diameter which is the max desired round-trip time (RTT) between any two nodes it contains). Also, each node has the same ID in all clusters.
Ø Hierarchical Lookup:
o Coral has 3 levels of clusters. A node belong to one cluster at each level:
¡ì Level 2: many fast clusters with regional coverage (RTT~30 msec).
¡ì Level 1: multiple clusters with continental coverage(RTT~100 msec)
¡ì Level 0: one planet-wide cluster (RTT = inf).
o Each cluster has a unique ID.
o Coral uses this hierarchy for distance-optimized lookup. Details on inserting key/value pair and retrieving key are in section 2.2.
Ø Cluster joining:
o A node will only join an acceptable cluster (one in which the latency to 90% of the nodes is below the cluster¡¯s diameter); a node unable to find an acceptable cluster creates a new one with a random CID.
o A node can join a better cluster whenever it learns of one.
Ø Cluster merging: when a node knows of two acceptable clusters at a given level, it will join (switch to) the larger one.
Ø Cluster spitting:
o A cluster may need to split in order to remain acceptable to its nodes.
o Coral specifies a node c within CID as a cluster center; all nodes near c join one cluster; all other nodes join a second cluster.
² Dongmei¡¯s presentation on Kullback-Leibler divergence:
Ø
K-L
divergence treated as a ¡°distance¡± between two distributions p and q.
Ø General Concept:
Ø In IR, K-L is a widely used distance metric to measure how well one probability distribution predict another, especially, to measure how well a topic model for topic T predicts a query Q.
o A topic model for a topic T is a probability distribution {p1, p2, ¡ pn} over a vocabulary set {w1, w2, ¡wn}, where pi is the frequency with which word wi is used in the text of T.
o
where f(Q, wi) is the number of occurrences of wi in Q;
where F(D, wi): number of occurrences of wi in D.
o The smaller the value, the better the topic model of T predicts Q.
² Linh explored query relaxation and presented experiment results for:
Ø Early stop masking: stop querying once some results got returned.
Ø Max succeeding subquery (xss) masking: used on failed queries Q which returned nothing; Try to compose Q¡¯ by using the terms of Q to return some results and any super queries of Q¡¯ return nothing.
² Linh:
Ø Try to explain and analyze the query masking experiment results.
Ø Start the preparation for the qualifiers.
Ø Make a consistant version of the p2psim scource code.
² Dongmei:
Ø Pick a paper to present for next week.
Text-Based Content Search and Retrieval in ad hoc P2P Communities. Francisco Matias Cuenca-Acuna and T. D. Nguyen, Department of Computer Science, Rutgers University. In proceedings of the International Workshop on Peer-to-Peer Computing, May 2002.
http://www.panic-lab.rutgers.edu/Research/planetp/Publications/content_search_on_P2P_02(FINAL).pdf
Ø Start the preparation for the qualifiers.
Ø Ideas on journalizing the rare paper.
Ø Work on limewire prototype.