>Example 1: XML Indexing And Searching

Example 1: XML Indexing And Searching

This example shows how Zebra can be used with absolutely minimal configuration to index a body of XML documents, and search them using XPath expressions to specify access points.

Go to the examples/zthes subdirectory of the distribution archive. There you will find a Makefile that will populate the records subdirectory with a file of Zthes records representing a taxonomic hierarchy of dinosaurs. (The records are generated from the family tree in the file dino.tree.) Type make records/dino.xml to make the XML data file.

Now we need to create a Zebra database to hold and index the XML records. We do this with the Zebra indexer, zebraidx, which is driven by the zebra.cfg configuration file. For our purposes, we don't need any special behaviour - we can use the defaults - so we start with a minimal file that just tells zebraidx where to find the default indexing rules, and how to parse the records:

    profilePath: .:../../tab
    recordType: grs.sgml
   

That's all you need for a minimal Zebra configuration. Now you can roll the XML records into the database and build the indexes:

    zebraidx update records
   

Now start the server. Like the indexer, its behaviour is controlled by the zebra.cfg file; and like the indexer, it works just fine with this minimal configuration.

	zebrasrv
   
By default, the server listens on IP port number 9999, although this can easily be changed - see the Section called Running the Z39.50 Server (zebrasrv) in Chapter 7.

Now you can use the Z39.50 client program of your choice to execute XPath-based boolean queries and fetch the XML records that satisfy them:

    $ yaz-client tcp:@:9999
    Connecting...Ok.
    Z> find @attr 1=/Zthes/termName Sauroposeidon
    Number of hits: 1
    Z> format xml
    Z> show 1
    <Zthes>
     <termId>22</termId>
     <termName>Sauroposeidon</termName>
     <termType>PT</termType>
     <relation>
      <relationType>BT</relationType>
      <termId>21</termId>
      <termName>Brachiosauridae</termName>
      <termType>PT</termType>
     </relation>

      <idzebra xmlns="http://www.indexdata.dk/zebra/">
	<size>245</size>
	<localnumber>23</localnumber>
	<filename>records/dino.xml</filename>
      </idzebra>
    </Zthes>
   

Now wasn't that easy?