De-duplication in KOBV

Stefan Lohrum, Wolfram Schneider, Josef Willenborg


Abstract: In KOBV we offer the user an efficient tool for searching regional and worldwide accessible library catalogues (KOBV search engine). Search is performed by a distributed Z39.50 retrieval and an index based quicksearch. Due to the number of catalogues, result sets may contain a significant amount of duplicate records.
Therefore we integrate a de-duplication procedure into KOBV search engine. It is part of the distributed search and the KOBV quicksearch as well.
Main goals are the presentation of uniform retrieval results, the preservation of retrieval quality and cutting off redundant information. At least we keep an eye on efficiency. De-duplication is fully parametrizable, so that settings can be changed easily on line.

Keywords: de-duplication method, distributed library system, clustering
MSC: 94-XX
CR: H.3.6., H.3.3

You can get a printed copy of the paper for free. Please contact the Konrad-Zuse-Zentrum and ask for the ZIB preprint SC 99-05.

Online version: HTML, PostScript, PDF

$Date: 2007-05-20 08:45:22 $