Table of Contents:
SWISH-E is Simple Web Indexing System for Humans - Enhanced. SWISH-E can quickly and easily index directories of files or remote web sites and search the generated indexes.
SWISH was created by Kevin Hughes to fill the need of the growing number of Web administrators on the Internet - many of the indexing systems were not well documented, were hard to use and install, and were too complex for their own good. The system was widely used for several years, long enough to collect some bug fixes and requests for enhancements.
In Fall 1996, The Library of UC Berkeley received permission from Kevin Hughes to implement bug fixes and enhancements to the original binary. The result is SWISH-Enhanced or SWISH-E, brought to you by the SWISH-E Development Team.
SWISH-E version 2.2 represents a major rewrite of the code and the addition of many new features. Memory requirements for indexing have been reduced, and indexing speed is much improved. New features allow more control over indexing, better document parsing, improved indexing and searching logic, better filter code, and the ability to index from any data source.
[ TOC ]
Quickly index a large number of documents in different formats including text, HTML, and XML
Includes a web spider for indexing remote documents over HTTP. Follows Robots Exclusion Rules (including <META> tags).
Use ``filters'' to index any type of file such as PDF, gzip, or Postscript
Use an external program to supply documents to swish, such as an advanced spider for your web server, or a program to read and format records from a relational database management system (RDBMS).
Document ``properties'' (some subset of the source document, usually defined as a META or XML elements) may be stored in the index and returned with search results
Document summaries can be returned with each search
Word stemming and soundex indexing
Phrase searching and wildcard searching
Limit searches to HTML links
Use powerful Regular Expression to select documents for indexing
Easily limit searches to parts or all of your web site
Results can be sorted by relevance or by any number of properties in ascending or decending order
Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements.
Can report structural errors in your XML and HTML documents
Includes example search scripts
SWISH-E is fast.
It's open source and FREE! You can customize SWISH-E and you can contribute your fancy new features to the project.
Supported by on-line user and developer group
[ TOC ]
The current version of SWISH-E can be found at:
Please make sure you use a current version of swish-e.
Information about Windows binary distributions can also be found at this site.
[ TOC ]
Read the INSTALL page. Information on specific ports (VMS and Win32) can be found in sub-directories of the src directory.
The Windows binary can be found in the src/win32
directory.
In addition to the INSTALL page, make sure you read the SWISH-FAQ page if you have any questions, or to get an idea of questions that you might someday ask.
[ TOC ]
Documetation is provided in the SWISH-E distribution package in two forms, POD (Plain Old Documentation), and in html format. The POD documentation is in the pod directory, and the HTML documentation is in the html directory, of course.
If your system includes the required support files and programs, the distribution make files can also generate the documentation in these formats:
Postscript PDF (Adobe Acrobat) system man pages |
You may also build a ``split'' version of the documentation where each topic heading is a separate web page. Building the split version also creates a SWISH-E index of the documentation that makes the documentation searchable via the included Perl CGI program.
Buiding these other forms of documentation require aditional helper applications -- most modern Linux distributions will include all that's needed. At least mine does... You shouldn't have a problem if you have kept your perl and perl libraries up to date.
Online documentation can be found at the SWISH-E web site listed above.
See INSTALL for information on creating the PDF and Postscript versions of the
documentation, and for information on installing the SWISH-* documentation
as Unix man(1)
pages.
[ TOC ]
The SWISH-E documentation included with the SWISH-E distribution is in POD and HTML formats. The POD documentation can be found in the pod directory, and the HTML documentation can be found in the html directory.
To view the HTML documentation point your browser to the html/index.html file.
The POD documentation is displayed by the ``perldoc'' command that is included with every Perl installation. For example, to view the swish-e installation documentation page called ``INSTALL'', type
perldoc pod/INSTALL |
or to make life easier,
cd pod perldoc INSTALL perldoc SWISH-RUN |
Complain to your system administrator if the perldoc
command is not available on your machine.
[ TOC ]
The following documentation is included in this SWISH-E distribution:
README -- this file
INSTALL -- Installation and basic usage instructions
SWISH-CONFIG -- Configuration File Directives
SWISH-RUN -- Running Swish and Command Line Switches
SWISH-SEARCH --All about Searching with SWISH-E
SWISH-FAQ -- Common questions, and some answers
SWISH-LIBRARY -- Interface to the SWISH-E C library
SWISH-PERL -- Instructions for using the Perl library
CHANGES -- List of feature changes and bug fixes
SWISH-BUGS -- List of known bugs in the release.
[ TOC ]
The SWISH-E documentation in HTML format was created with Pod::HtmlPsPdf, a
package of Perl modules written and/or modified by Stas Bekman to automate
the conversion of documents in pod format (see perldoc perlpod) to HTML,
Postscript, and PDF. A slightly modified version of this package is include
with the SWISH-E distribution and used for building the HTML. As
distributed, SWISH-E contains only the pod and HTML documentation. See INSTALL for instructions on creating man(1),
Postscript, and PFD
formats.
Thanks, Stas, for your help!
[ TOC ]
Here's an overview of the directories included in the swish-e distribution:
Example swish-e configuration setups to help you get started. In the stopwords
sub-directory are a number of stopword files for different languages.
Contains files required for building the HTML, PDF, and Postscript documentation.
This contains a sample CGI front-end for running swish-e.
Sample programs to use with swish-e's ``filters''. Examples include PDF, MS Word, and binary strings filters.
The documentation in HTML format.
The perl interface to the swish-e C library.
The documentation in perldoc (pod) format.
Example programs and modules to use with the ``prog'' document source access method. Examples include a web spider, and a program to filter pdf and MS word documents.
This directory contains the source code for swish-e. OS-specific directories are also found here.
The documents used for running make test
.
[ TOC ]
If you need help with installing or using SWISH-E please subscribe to the SWISH-E mailing list. See visit the SWISH-E web site listed above for information on subscribing to the SWISH-E list.
Before posting any questions please read QUESTIONS AND TROUBLESHOOTING in the INSTALL documentation page.
[ TOC ]
Please contact the SWISH-E list with corrections to this documentation. Any help in cleaning up the docs will be appreciated!
[ TOC ]
SWISH-E is currently being developed as an open source project on SourceForge http://sourceforge.net.
Contact the SWISH-E list for questions.
[ TOC ]
Each document in the SWISH-E distribution contains this section. It refers only to the specific page it's located in, and not to the SWISH-E program or the documentation as a whole.
$Id: README.pod,v 1.8 2001/09/29 15:38:54 whmoseley Exp $
. [ TOC ]