Group Meeting Minutes
Date: July 7th,
2005
Place: SB232C
Attendees: Wai Gen, Linh,
Dongmei
1/ KDD conference in Chicago: Dongmei, Linh and some other IIT students are volunteers:
- Make personal webpage
- Make business card
- Prepare a 30 seconds speech
about what you (we) are doing
2/ Oncoming deadline:
- SAC: Sep 3,
Dijon-France
- EDBT: Sep 23, Munich
- ICDCS: Nov
- SIGMOD: Nov,
Chicago-USA
- WWW: Nov
- SIGIR: Jan,
Seattle-USA
- VLDB: Feb
- MDM: Oct, Japan
- ECIR: Oct, Portugal
Linh: work on “query masking”
paper for EDBT.
Dongmei: work on the
prototype for SIGMOD.
3/ Wai Gen will talk about
the intersection of P2P technologies and mobile computing at UIC on July 20.
4/ Wai Gen presented
experiments’ results of query masking technique:
- Based on the results,
ranking by arrival is as effective as other ranking techniques. Something
must be wrong ! (bug in simulator, wrong parameter settings ?)
- Using query masking, tf
ranking function is the best. Not using query masking, gsize ranking
function is best. Why?
- Can we devise better
metric, that combine both performance and cost.
5/ Dongmei’s presentation on
the paper “An architecture for information retrieval over semi-collaborating
peer-to-peer networks”. I. A. Klampanos and J. M. Jose. In Proceedings of the
2004 ACM Symposium on Applied Computing, volume 2, pages 1078–1083, Nicosia, Cyprus, March 14–17 2004:
- Two important goals of a
p2p Information Retrieval system might be efficiently identifying of
relevant information sources and efficiently routing queries to those sources.
The paper presented a cluster-based architecture for IR over p2p networks,
targeting to these two goals.
- The term
“semi-collaborating” means nodes in p2p network cooperate in order to
perform information retrieval, but they don’t have to share any detailed
information, nor do they have to be consistent with respect to the IR
systems they use.
- Each node in the network
may choose to implement one or more of the following services: Client
service, that provides end-user interface; Information Provider service,
that shares local documents to other nodes in the network; Hub service,
that provides message routing service to other nodes, and Fusion service,
that handles the fusion of retrieved results, on behalf of querying node.
- Content-Aware Clustering:
two clustering stages are applied: in-peer document clustering and peers
clustering.
- In-peer document
clustering: each document is represented by its term frequency vector.
Initially, each cluster contains a single document. Cosine-similarity
formular is used to measure distance between any two vectors. At each
step, two clusters closest to each other are merged to form a new
cluster. The clustering process stops when there is no single-document
cluster left.
- Peer clustering: peers are
clustered into Content-Aware Groups (CAGs), based on their internal
clusters, the variance of their clusters, and the participation level of
their clusters. One peer may belong to many CAGs. A peer is merged with a
CAG if the different between its centroi vector and the one of the CAG is
smaller than a threshold value, its variance is smaller than the CAG
variance, and its participation level is greater than the one of the CAG.
- Query routing:
§
Each hub-enabled peer must
maintain the descriptors of all CAGs in the network.
§
Upon receiving a query, a
hub-enabled peer will score and rank all the CAGs based on their distance to
the query, their variance, and their participation level. The query will be
sent to the top n CAGs.
§
Within each CAG, peers get ranked
following the same procedure, and the query is routed to the top m
peers.
·
Combination of results: using
Dempster-Shafer (D-S) theory of evidence combination (?). Each peer is assigned
an un-trust coefficient based on its ranking score, and each result retrieved
from a peer is scored based on its ranking score and the total number of
results. The product of the peer’s un-trust coefficient and the result’s
ranking score is used as global ranking score for that result.
·
Evaluation: experiments were
conducted to prove that the technique can actually work efficiently.
6/ Dongmei presented
experiments’ results based on the uniform distribution of documents: need to
modify the simulator and rerun experiments.
7/ Assigments:
Linh: critique the ICDE paper
(query masking paper); finish experiments; continue to work on the query
masking technique
Dongmei: modify the simulator
and rerun experiment
8/ Paper for next meeting: Making
Peer-to-Peer Keyword Searching Feasible Using Multi-level Partitioning. Shuming
Shi et al. Tshinghua Univ. Third International Workshop on Peer to Peer Systems
2004 (IPTPS’04).
http://iptps04.cs.ucsd.edu/papers/shi-keysearch.pdf