![]()
|
[ TOC ]
OVERVIEWThis page describes the process of searching with Swish-e. Please see the SWISH-CONFIG page for information the Swish-e configuration file directives, and SWISH-RUN for a complete list of command line arguments. Searching a Swish-e index involves passing command line arguments to it that specify the index file to use, and the query (or search words) to locate in the index. Swish-e returns a list of file names (or URLs) that contain the matched search words. Perl is often used as a front-end to Swish-e such as in CGI applications, and perl modules exist to for interfacing with Swish-e. [ TOC ] Searching Syntax and Operations
The The following section describes various aspects of searching with Swish-e. [ TOC ] Boolean OperatorsYou can use the Boolean operators and, or, or not in searching. Without these Boolean operators, Swish-e will assume you're anding the words together. The operators are not case sensitive. [Note: you can change the default to oring by changing the variable DEFAULT_RULE in the config.h file and recompiling Swish-e.] Evaluation takes place from left to right only, although you can use parentheses to force the order of evaluation. Examples:
Retrieves files containing either the words ``smilla'' or ``snow''.
retrieves first the files that contain both the words ``smilla'' and ``snow''; then among those the ones that do not contain the word ``sense''. [ TOC ] TruncationThe wildcard (*) is available, however it can only be used at the end of a word: otherwise is is considerd a normal character (i.e. can be searched for if included in the WordCharacters directive).
this query only retrieves files which contain the given word. On the other hand:
retrieves ``librarians'', ``librarianship'', etc. along with ``librarian''.
Note that wildcard searches combined with word stemming can lead to
unexpected results. If stemming is enabled, a search term with a wildcard
will be stemmed internally before searching. So searching for
[ TOC ] Order of EvaluationExpressions are always evaluated left to right:
retrieves files which contain ``juliet'' and ``pac'' but not ``ophelia'' However it is always possible to force the order of evaluation by using parenthesis. For example:
retrieves files with ``juliet'' and containing neither ``ophelia'' nor ``pac''. [ TOC ] Meta TagsMetaNames are used to represent fields (called columns in a database) and provide a way to search in only parts of a document. See SWISH-CONFIG for a description of MetaNames, and how they are specified in the source document. To limit a search to words found in a meta tag you prefix the keywords with the name of the meta tag, followed by the equal sign:
It is not necessary to have spaces at either side of the ``='', consequently the following are equivalent:
To search on a word that contains a ``='', precede the ``='' with a ``\'' (backslash).
this query returns the files where the word ``x=4'' is associated with the metaName ``test=3'' or that contains the word ``y=5'' not associated with any metaName. Queries can be also constructed using any of the usual search features, moreover metaName and plain search can be mixed in a single query.
This query will retrieve all the files in which ``a1'' or ``a2'' are found in the META tag ``metaName1'' and that do not contain the words ``a3'' and ``a7'', where ``a3'' and ``a7'' are not associated to any meta name. [ TOC ] Phrase SearchingTo search for a phrase in a document use double-quotes to delimit your search terms. (The phrase delimiter is set in src/swish.h.) You must protect the quotes from the shell. For example, under Unix:
Or under Windows:
You can use the
At times you might not want to search for a word in every part of your
files since you know that the The -t option in the search command line allows you to search for words that exist only in specific HTML tags. Each character in the string you specify in the argument to this option represents a different tag in which the word is searched; that is you can use any combinations of the following characters:
[ TOC ] Searching with PerlPerl ( http://www.perl.com/ ) is probably the most common programming language used with Swish-e, especially in CGI interfaces. Perl makes searching and parsing results with Swish-e easy, but if not done properly can leave your server vulnerable to attacks. When designing your CGI scripts you should carefully screen user input, and include features such as paged results and a timer to limit time required for a search to complete. These are to protect your web site against a denial of service (DoS) attack. Included with every distribution of Perl is a document called perlsec -- Perl Security. Please take time to read and understand that document before writing CGI scripts in perl. Type at your shell/command prompt:
If nothing else, start every CGI program in perl as such:
That alone won't make your script secure, but may help you find insecure code. [ TOC ] CGI Danger!
There are many examples of CGI scripts on the Internet. Many are poorly
written and insecure. A commonly seen way to execute Swish-e from a perl
CGI script is with a piped open. For example, it is common to see this type of
This Even if you can be sure that any user supplied data is safe, this piped open still passes the command parameters through the shell. If nothing else, it's just an extra unnecessary step to running Swish-e.
Therefore, the recommended approach is to fork and exec Type:
If all this sounds complicated you may wish to use a Perl module that does all the hard work for you. [ TOC ] Perl ModulesThere are a couple of Perl modules for accessing Swish-e. One of the modules is included with the distribution, and the other module (or set of modules) is located on CPAN. The included module provides a way to embed Swish-e into your perl program, while the modules on CPAN provide an abstracted interface to it. Hopefully, they make using Swish-e easier. The Included SWISHE Perl Module When compiling Swish-e from source the build process creates a C library (see the Swish-e INSTALL documentation). The Swish-e distribution includes a perl directory with files required to create the SWISHE.pm module. This module will embed Swish-e into your perl program so that searching does not require running an external program. Embedding the Swish-e program into your perl program results in faster Swish-e searches since it avoids the cost of forking and exec'ing a separate program and opening the index file for each request. You will probably not want to embed Swish-e into perl if running under mod_perl as you will end up with very large Apache processes. Building and usage instructions for the SWISHE.pm module can be found in the SWISH-PERL man page. Here's an edited snip from that man page:
SWISH Modules on CPAN The Comprehensive Perl Archive Network, or CPAN, is a collection of modules for use with Perl. Always search CPAN (http://search.cpan.org/) before starting any new program. Chances are someone has written just what you need. On CPAN are also modules for searching with Swish-e. They can be found at http://search.cpan.org/search?mode=module&query=SWISH The main SWISH module (different from the SWISH&E; module included with the Swish-e distribution) provides a high-level Object Oriented interface to Swish-e, and the same interface can be used to used to either fork and exec the Swish-e binary, or use the Swish-e C Library if installed by just changing one line of code. A server interface will be written when a Swish-e server is written. The main idea is that you can write a program to search with Swish-e, but not have to change your code (much) when you wish to change to a new way of accessing Swish-e. Here's an example of SWISH module usage from the synopsis:
This takes care of running Swish-e in a secure way, parsing the output from it, and providing OO methods of accessing the resulting data. [ TOC ] Document Info$Id: SWISH-SEARCH.pod,v 1.4 2002/04/15 02:34:43 whmoseley Exp $ . [ TOC ]
|