WAIS was developed by "Thinking Machines Inc." in 1988 for indexing and searching document indexes. It employs a client/server architecture. It was an advance made necessary by the large number of documents residing on web sites. Free text searches such as "grep" were too slow to be applied against large numbers of documents. WAIS speeds up the process by performing the searches up front. A WAIS search will return the titles for documents best matching the search.
Indexing a site will create databases (or sources) by indexing the documents. This is done by the program waisindex. The sources generated are used by the waisserver. The program waisq is the interface to the WAIS server.
WAIS incorporates relevance ranking which assigns a factor to all indexed words. Words appearing in a title will be assigned a higher relevence. Words which are used less often get a higher ranking. The number of times a word is used in a document and the size of the document also influence the weighting of the word in the index.
|
Related YoLinux Tutorials:
|
ftp://sunsite.unc.edu/pub/packages/infosystems/wais/servers/freeWAIS/Get binaries: freeWAIS-0.5-<UNIX type>.tar.gz where "<UNIX type>" is SunOS, Linux, AIX... or get source code: freeWAIS-0.5.tar.gz Note the use of the word "source" in the WAIS world does not always mean source code. It often means the source of a search index. (as in origin) Man pages: ftp://sunsite.unc.edu/pub/packages/infosystems/wais/documentation/man-pages/*.1 Technical explanation of file structure: (not needed) ftp://sunsite.unc.edu/pub/packages/infosystems/wais/documentation/protspec.txt Note metalab.unc.edu = sunsite.unc.edu.
Unzip-tar the binaries:
The essential elements you need from this are: waisq, waissearch, waisserver and waisindex. (You can use swais.sh which calls swais to run WAIS without the network if you wish) Place WAIS binaries in "/usr/local/bin", "/opt/bin" or other accessible bin directory.
Indexing a collection of documents generates a "sources" database comprised of the following files:
My index script: (Indexes for use on the web)
Create synonym file if required:
/usr/local/http/wais/sources/abc_index.syn
Microsoft Monopoly
Words to be ignored are hard coded for you in waisindex.
Start script: #!/bin/csh
waisindex flags:-d :Directory including file name prefix for source files./usr/local/http/wais/sources/abc_index = File name without suffix for index.-t : Type of index created URL = Returned result from search will be in the form of a URL-r : Recursively through subdirectories. /usr/local/docs/HTML = Path of html documents you will be indexing.
I could never get it working from inetd. Use script instead. Used start script: (placed this statement in /etc/rc.local terminated with &) #!/bin/cshExplanation: -p = Port number. Ansi standard Z39.50 says use port 210 -d = Directory of index files inetd setup: (DID NOT WORK!!) File: /etc/inetd.conf (single line) # wais web index server File: /etc/services wais 210/tcp # wais server for web indexing AIX start script: --start from cgi-bin by server #!/bin/ksh
PERL script to invoke WAIS client "waisq". Download scripts kidofwais.pl, print_hit_bold.pl and cgi-lib.pl and place them in your /cgi-bin/.
The cgi Perl script to execute can be found at:
Edit script:
Download script: http://ljordal.cso.uiuc.edu/print_hit_bold.pl Edit script variables $serverURL and $maintainer. This requires the Perl script cgi-lib.pl:
- Previous setup is for one index -
Searching multiple indexes with one querry: (OPTIONAL) - Usefull for multiple servers Set variables $use_Source_table = 1; Create file /usr/local/http/wais/sources/Source_table Sample: abc_index~ABC Developer Web Site~1~ABC:~~abc_index,abc_index_2,abc_index_3 See: http://www.cso.uiuc.edu/grady.html/Source_table.txt Note: First line references itself and the lines which follow. Use "1" on first line to allow it to reference other lines using "0" which do not further reference anything else. Format: Table of wais sources and how to process them - columns separated by tilde. wais_source_name~title_to_use~search_multiple_indices?~short_name(all on a single line) This table contains the following info:
|
|
Return to http://YoLinux.com for more Linux links, information and tutorials
Return to YoLinux Tutorial Index Feedback Form Copyright © 1999 by Greg Ippolito |