The Isearch Indexing and Search Engine is free and available for UNIX; plus:
The Isearch ftp download area is at ftp://ftp.cnidr.org/pub/software/Isearch/.
Also be wary that there seems to be another distributions and information at:
In the case of the CCP14. All html, htm and relevant text files are indexed for each virtual domain (www, alife, programming, netlib, gnu, etc). For possible regional mirroring purposes, it was decided to keep things separate.
For CCP14, because there are a variety of different "virtual domains" with their own search databases, each search database is put in it's own cgi-bin directory.
In this case, for the ccp14web index, the three config files, ifetch, ihtml and ihtml are put in /usr/local/etc/httpd/cgi-bin/ccp14/ as designated by the apache 1.3.x configuration setup. Though the CGI executables are in /usr/local/etc/httpd/cgi-bin/ so that different virtual domains (www, netlib, gnu, programming, alife) use the same executable.
#!/bin/sh # From this script, run the isrch_fetch utility and pass 4 arguments: # # isrch_fetch$1 $2 $3 # # /path/to/Isearch-cgi/isrch_fetch /path/to/my/databases exec /usr/local/apache/share/cgi-bin/isrch_fetch /web_disc/ccp14/web_area/isearch/ccp14web $1 $2 $3
#!/bin/sh # From this script, run the isrch_srch utility and pass a single argument # that is the directory where your database are stored. # # For example: # # /path/to/Isearch-cgi/isrch_html /path/to/my/databases exec /usr/local/apache/share/cgi-bin/isrch_html /web_disc/ccp14/web_area/isearch/ccp14web
#!/bin/sh # From this script, run the isrch_srch utility and pass a single argument # that is the directory where your database are stored. # # For example: # # /path/to/Isearch-cgi/isrch_srch /path/to/my/databases exec /usr/local/apache/share/cgi-bin/isrch_srch /web_disc/ccp14/web_area/isearch/ccp14web
As automirroring of webpages is implemented between 1am and 5am each morning using WGET, it is necessary that the Iindex database reflects this change after the auto-mirroring session. While an incremental update is feasible using the "-a" option, the Isearch mailing list subscribers recommend just generating the database from scratch which under this cercumstance.
Note: If the cron script does not seem to be working, check that you have either specified the full path for running Iindex or that the path is specified in the default PATH
In the .crontab file (which can then be passed into the crontab using the command crontab .crontab), put the script file that is going to be run after the automirroring. In this case, the script will run each morning at 5.07am.
05 07 * * * ./isearch.index.script
This calls a script file to regenerate the index file using the recommend method (generating a file of all the files to be indexed, then running Iindex on this file), then move it over the old one so as to minimize downtime of the indexing to a fraction of a second. The last lines send an email to ccp14@dl.ac.uk confirming the script has run and the time completed.
#!/bin/csh # You should CHANGE THE NEXT 3 LINES to suit your local setup setenv LOGDIR ./web_area/mirrorbin/logs # directory for storing logs setenv PROGDIR ./web_area/mirrorbin # location of executable setenv PUTDIR ./web_area/web_live/ccp # relative directory for mirroring # relative due to possible kludge in wget #can change to absolute if you wish - some internal links may not work set DATE=(`date`) sed "/START_Iindex/s/NOT_FINISHED/Regeneration_STARTED $DATE/" ./report-template.txt > ./report.txt.new mv report.txt.new report.txt rm -rf web_area/isearch/temp mkdir web_area/isearch/temp rm -f web_area/isearch/*.txt* find web_area/web_live/ -name "*.html" -type f -print > web_area/isearch/tmpfile.txt find web_area/web_live/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.txt find web_area/web_live/ -name "*.txt" -type f -print > web_area/isearch/tmpfile.txt2 find web_area/web_live/ -name "readme.1st" -type f -print >> web_area/isearch/tmpfile.txt2 find web_area/web_live/ -name "readme.2nd" -type f -print >> web_area/isearch/tmpfile.txt2 grep -v Ray-Tracing-News web_area/isearch/tmpfile.txt > web_area/isearch/tmpfile.txta grep -v CCP14-by-OS web_area/isearch/tmpfile.txt2 > web_area/isearch/tmpfile.txt2a grep -v ccp14-by-program web_area/isearch/tmpfile.txt2a > web_area/isearch/tmpfile.txt2b /usr/local/bin/Iindex -d web_area/isearch/temp/ccp14web -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txta > web_area/isearch/summary.txt /usr/local/bin/Iindex -d web_area/isearch/temp/ccp14web -m 16 -t SIMPLE -a -f web_area/isearch/tmpfile.txt2b >> web_area/isearch/summary.txt mv web_area/isearch/ccp14web web_area/isearch/ccp14webold mv web_area/isearch/temp web_area/isearch/ccp14web rm -rf web_area/isearch/ccp14webold # 2>&1 - puts standard err to the file as well. rm -rf web_area/isearch/temp mkdir web_area/isearch/temp rm -f web_area/isearch/*.txt* find web_area/xrd/web/ -name "*.html" -type f -print > web_area/isearch/tmpfile.txt find web_area/xrd/web/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.txt grep -v web_stats web_area/isearch/tmpfile.txt > web_area/isearch/tmpfile.txta /usr/local/bin/Iindex -d web_area/isearch/temp/wwwxrd -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txta > web_area/isearch/summary.txt mv web_area/isearch/wwwxrd web_area/isearch/wwwxrdold mv web_area/isearch/temp web_area/isearch/wwwxrd rm -rf web_area/isearch/wwwxrdold set DATE=(`date`) sed "/WWWXRD_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt set DATE=(`date`) sed "/WWW_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" ./report.txt > ./report.txt.new mv report.txt.new report.txt rm -rf web_area/isearch/temp mkdir web_area/isearch/temp rm -f web_area/isearch/*.txt* find web_area/programming/ -name "*.html" -type f -print > web_area/isearch/tmpfile.programming.txt find web_area/programming/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.programming.txt find web_area/programming/ -name "*.txt" -type f -print > web_area/isearch/tmpfile2.programming.txt /usr/local/bin/Iindex -d web_area/isearch/temp/programming -m 15 -t SGMLTAG -f web_area/isearch/tmpfile.programming.txt > web_area/isearch/summary.txt /usr/local/bin/Iindex -d web_area/isearch/temp/programming -m 15 -t SIMPLE -a -f web_area/isearch/tmpfile2.programming.txt >> web_area/isearch/summary.txt mv web_area/isearch/programming web_area/isearch/progwebold mv web_area/isearch/temp web_area/isearch/programming rm -rf web_area/isearch/progwebold set DATE=(`date`) sed "/PROGRAMMING_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt rm -rf web_area/isearch/temp mkdir web_area/isearch/temp rm -f web_area/isearch/*.txt* find web_area/alife/ -name "*.html" -type f -print > web_area/isearch/tmpfile.alife.txt find web_area/alife/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.alife.txt find web_area/alife/ -name "*.txt" -type f -print > web_area/isearch/tmpfile2.alife.txt /usr/local/bin/Iindex -d web_area/isearch/temp/alife -m 15 -t SGMLTAG -f web_area/isearch/tmpfile.alife.txt > web_area/isearch/summary.txt /usr/local/bin/Iindex -d web_area/isearch/temp/alife -m 15 -t SIMPLE -a -f web_area/isearch/tmpfile2.alife.txt >> web_area/isearch/summary.txt mv web_area/isearch/alife web_area/isearch/alifewebold mv web_area/isearch/temp web_area/isearch/alife rm -rf web_area/isearch/alifewebold set DATE=(`date`) sed "/ALIFE__Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt rm -rf web_area/isearch/temp mkdir web_area/isearch/temp rm -f web_area/isearch/*.txt* find web_area/netlib/ -name "*.html" -type f -print > web_area/isearch/tmpfile.netlib.html.txt find web_area/netlib/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.netlib.html.txt find web_area/netlib/ -name "*.txt" -type f -print > web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "readme" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "*.c" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "*.src" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "*.f" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "manual" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "manlc" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "helplc" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "imsl" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "nag" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "port" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "siam" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "index" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "doc" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "source" -type f -print >> web_area/isearch/tmpfile.netlib.txt find web_area/netlib/ -name "*.text" -type f -print >> web_area/isearch/tmpfile.netlib.txt /usr/local/bin/Iindex -d web_area/isearch/temp/netlib -m 25 -t SGMLTAG -f web_area/isearch/tmpfile.netlib.html.txt > web_area/isearch/summary.txt /usr/local/bin/Iindex -d web_area/isearch/temp/netlib -m 25 -t SIMPLE -a -f web_area/isearch/tmpfile.netlib.txt >> web_area/isearch/summary.txt mv web_area/isearch/netlib web_area/isearch/netlibwebold mv web_area/isearch/temp web_area/isearch/netlib rm -rf web_area/isearch/netlibwebold set DATE=(`date`) sed "/NETLIB_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt rm -rf web_area/isearch/temp mkdir web_area/isearch/temp rm -f web_area/isearch/*.txt* find web_area/xrd/web/ -name "*.html" -type f -print > web_area/isearch/tmpfile.txt find web_area/xrd/web/ -name "*.htm" -type f -print >> web_area/isearch/tmpfile.txt /usr/local/bin/Iindex -d web_area/isearch/temp/wwwxrd -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txt > web_area/isearch/summary.txt mv web_area/isearch/wwwxrd web_area/isearch/wwwxrdold mv web_area/isearch/temp web_area/isearch/wwwxrd rm -rf web_area/isearch/wwwxrdold /usr/sbin/Mail -s "Isite_Isearch_Creation_Results `date`" ccp14@ccp14.ac.uk < ./report.txt
Operation of Isearch-cgi ------------------------ 1) Create access points to databases Create a base HTML file with the program search_form. It takes two arguments: the path to your databases, and the name of the database this new page should access. The page is printed to standard output, so you may redirect it to a file if you like. search_form /home/databases TEST > form.html There is another, optional argument that indicates to search_form which type of search page you wish to generate. The form types are: -simple -boolean -advanced -html If no type is given to search_form, it will default to -simple Examples: search_form -simple /home/databases TEST > form.html search_form -boolean /home/databases TEST > boolean.html search_form -advanced /home/databases TEST > advanced.html search_form -html /home/databases TEST > htmlform.html
For example, to generate this for the CCP14 crystallographic Iindex database, you would use the command lines:
search_form -simple /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > form.html search_form -boolean /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > boolean.html search_form -advanced /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > advanced.html search_form -html /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > htmlform.htmlThen edit the resulting html file to get it in the form you like. In the case of the CCP14 Crystallographic search page, only the boolean and advanced search have been used. Full Text; TITLE, HEAD and ADDRESS are searchable fields with "Full Text" being the default. With TITLE, HEAD, ADDRESS being the result display options and TITLE being the default. AND, OR, NOT and NEAR being menu selected options to relate keywords with AND being the default.
Isearch-CGI Setup for the web
(There is also a Word Document that goes into the setup of the Web Interface for Isearch)