Introduction

Since the 70s, the acqusisiton of library catalogs is done electronically. The development of computer technology in the last few years has provided the necessary preconditions to administrate an ever growing amount of information electronically. Prompted by the success of the World Wide Web, many providers are beginning to open their databases to external users. This gives users the possibility to search in a database and exchange data directly via the Internet

It is the aim of this diploma thesis to develop a library information system which supports librarians in their search and capture of documents. The name of the system is ZACK.

The user of ZACK can search in one or several bibliographic databases for a document and include the suitable document in a local database. This inclusion of data records from an external database makes the capture of new documents much easier, as it reduces internal cataloging to a minimum. Double work is eliminated and a continuously high quality of data records is ensured.

The distributed search by means of ZACK is carried out in several databases in parallel. Duplicates are recognized; and a short list of matches without double entries is offered. The user can then decide for himself from which database to include data records . In practice, a distributed search has produced far better results than the search in one particular database. Response times are within an acceptable limit for the user. The short list of hits is made clearer and more concise by the duplicate check.

There are already systems which offer a distributed search without duplicate check (see section Existing Systems for a Distributed Search). The system ZACK developed in this diploma thesis, however, offers, for the first time, the possibility to search simultaneously in several databases of different libraries and to carry out an online duplicate check.

ZACK can be used from any computer, regardless which hardware, which operating system or which graphic interface has been used (Windows 3.11, X11, VT100 Terminals). The only precondition is a web browser, i.e., software which is available on nearly every computer. For each user, on the basis of the IP address individual settings can be made as to which databases access is allowed and whether he is authorized to include data records in his own database. ZACK is bilingual. The user can choose between a German and an English interface.

Survey of chapters

Chapter 2 Distributed Search contains basic considerations of information systems in general. It is explained why the search in several databases produces better results than the search in only one database.

Chapter 3 Modelling of ZACK explains on the basis of the client-server model how the search in a library database via the Internet is carried out. It contains a brief explanation of how communication between client and server takes place and which different protocols are used in ZACK. The Z39.50 protocol is introduced.

In Chapter 4 Implementation of ZACK the system as such is introduced. The design and implementation stages are illustrated; it is investigated which existing software was considered for the development of the system and which software is used in ZACK. With the first system, a search in only one database is possible, whereas the second systems allows a search in several databases in parallel.

The second system of ZACKis able to compare data records of different origin during the duplicate check. However, before this can take place, several criteria such as character sets, incorrect entries (e.g. double blanks) and different methods of data capture have to be recognized and processed. Chapter 5 Standardization contains an analysis of data records according to these criteria.

Chapter 6 Duplicate Check describes how the duplicate check is carried out by the system. The algorithms used, the computing effort required for ZACKand the results of the duplicate check are explained.

As a result of the duplicate check the user is offered a short list of matches without double entries. In Chapter 7 Duplicate Output the procedures available are described and evaluated; the procedure used in ZACKis illustrated.

Chapter 8 Practical Results of a Distributed Search illustrates how successful the distributed search is in practice. For this purpose, inquiries from librarians from Brandenburg who have used ZACKfor their search are evaluated - in one instance the inquiries made within one day and in another instance inquiries made over a period of several months.

Chapter 9 Problems during Operation describes minor and major problems which have occured in the practical use of the Z39.50 servers addressed by ZACK. Configuration problems with Z39.50 servers of various library systems, different manufacturers and in several libraries and library systems are described in detail.

Chapter 10 Outlook contains a conclusion for this diploma thesis as well as proposals for a further use of the system.

Appendix A Analysis of the MAB2 Data Records of the Deutsche Bibliothek represents a statistic evaluation of 2.5 million data records of the Deutsche Bibliothek. The aim is to find out which fields within data records can be used effectively, what is the proportion between books from publishing houses and other literature and what relations (hierarchies, references) exist between the data records. This information is required for the duplicate check and the output of the data records.

Appendix B Z39.50-Servers shows a list of Z39.50 servers used, Appendix C Software contains a short description of the software developed for ZACK, Appendix D Access Statistics represents an evaluation of the use of ZACKin practice. The Appendix is complemented by a List of Abbreviations und a Bibliography.


Footnote

...ZACK
ZACK is undeed the name of the system, not an abbreviation

Copyright (c) 1999 Wolfram Schneider
URL: http://wolfram.schneider.org/lv/diplom/
E-Mail: wosch@freebsd.org
4-Juli 1999