5. Conclusions and future work
In this paper, we present KBCS that represents collection as a weighted entity set based on context- and structure-based measures, and ranks collections according to a query-collection similarity metric that considers sampling factor, collection entity frequency, collection entity weight, and query entity weight. Context and structure semantic information encoded in DBpedia is exploited, and a DBpedia based query expansion method is also integrated to enrich entities found in query terms. We evaluate KBCS on a large dataset of CW09-CatB that is partitioned into topical collections, and experimental results demonstrate the effectiveness of KBCS.
In the future, we will take other semantic measures (e.g., content-based measure) to weight the entities in CSI accurately, and combine documents score and entities weight to improve the query-collection similarity metric. Furthermore, we will apply contextand structure-based measures to estimate the quality of expanded entities to improve the performance of community search in query expansion, on which the weights assigned to expanded entities can be optimized.