Meeting Minutes
Date: 06/23/05
Place: SB 232C
From 10:30 am to 1:00 pm
Attenders: Dr. Wai Gen Yee,
Dongmei Jia,
Linh Thai Nguyen.
1/ The “query masking” paper (Improving Search Performance
in P2P File Sharing Systems by Query Masking) is submitted to ICDE 2006. The
main idea of query masking is to use sub-queries to increase the probability of
finding desired data objects. In this work, we have tried different
combinations of metadata distributing techniques, ranking functions and client-centric
masking techniques, to see whether there is a good combination. Experimental
results show that query masking technique can actually improve query
performance in P2P systems. In the future, we will investigate how
server-centric masking techniques can affect query performance.
2/ Dr. Yee’s discover on experiments’ results:
- The simulator is slow. In order to run extensive
experiments on big dataset, we need some way to optimize the code.
- The dataset used is the small dataset (part 4). The query
length and the degree of masking are both varied.
- Effect of query length to ranking performance: For short
queries, there are more results. However, there are more noise in the
results, and short queries contain little information, hence it is
difficult to rank the results correctly.
- Effect of query length and degree of masking to ranking
performance: when query length is 6 and the number of unmasked terms is 1,
ranking based on term frequency is the best; when query length is 1 and
number of unmasked terms is 1, ranking based on group size is the best.
This pattern is quite strange (why?).
- For part 4 data, only 1 query out of 10 works.
- If we increase the number of peers to 5000 (instead of
1000), ranking performance increases significantly (but the simulator runs
very slow).
3/ Missing parts:
- We need intelligent arguments that can formally explain
why some ranking functions, in some specific situation, outperform other
ranking functions.
- Which ranking function works best for uniform distribution
of data objects.
- Which is better: client-centric query masking or
server-centric query masking. In addition, which query masking technique
to be used.
4/ Dongmei’s presentation: SETS: Search Enhanced by Topic
Segmentation.
- One site can only belong to one topic segment.
- The system needs one centralized site to serve as
administrative site. This site receives summary vectors from other sites
in the system and classifies these vectors using k-Mean algorithm, then it
distributes the centroid vector of each topic segments to every sites in
the system.
- The global and local query routing algorithms are not
discussed in detailed.
- The proposed techniques and structures are not novel
(represent documents by keyword vectors, classify using k-Mean...) but
this work is complete.
5/ Dr. Yee’s recommendations:
- Every reviewed paper need to be summaried.
- Students should try to work independently, write stuff for
other to read and comment.
- Be proactive in research.
6/ Assignments for next week:
- Dongmei: continue to work on the prototype.
- Linh: comparison between client-centric masking and
server-centric masking; find the best ranking function; find the best
masking function.
7/ Miscellaneous:
- one bug of the simulator found and fixed: the parameter
related to the computation of inverse ranking score is wrongly set to 1.