Why Use SWISH-E
|
Where to get SWISH-E 2Due to needing the -e option (for making use of temp files and not putting everything in memory when generating the Index), the latest developmental beta version (2.1.8) of SWISH-E 2 was used. Not using the "-e" option in the latest SWISH-E may overwhelm your server's memory and slow it down to a crawl when indexing large sites. The SWISH-E-2.1.X builtin sample perl module was not used for the CGI and instead the older (more established) perl scripts were used as it seemed better at this time to avoid the "bleeding-edge" beta features. Upgrading to this at a later date would be a good idea for enhanged CGI script security.
|
Compiling and Installing the Web ServerThis assumes you are using a UNIX machine and have C or GCC installed. Refer to Compiling the GNU GCC C compiler for information on how to do this. With a decent workstation, gcc should be there by default. Thus compiling should not be a problem and the ./configure programs should be able to detect make compiling a relatively trivial application.
|
Deciding What to IndexIn the case of the CCP14. All html, htm and are indexed for each relevant virtual domain (Crystallographic www, alife, programming, etc). For possible regional mirroring purposes, it was decided to keep things separate and to limit irrelevant hits due to mirroring different subject areas. |
Config FilesFor SWISH-E, the important thing is to set up config files that are optimised for the locally based information. This can sometimes only be found by playing around with the options and seeing what effect they have. CCP14 relevant Issues found (there may be better ways to handle this):
SWISH-E Config Files (more options are available with SWISH-E - check the documentation)
Example live search files: |
Daily Auto Indexing-Creation of the SWISH-E Search DatabaseAs automirroring of webpages is implemented between 1am and 5am each morning using WGET and rsync, it is necessary that the SWISH-E database reflects this change after the auto-mirroring session. In the .crontab file (which can then be passed into the crontab using the command crontab .crontab), put the script file that is going to be run after the automirroring. In this case, the script will run each morning at 5.07am.
05 07 * * * ./swish.searchsindex.script This calls a script file to regenerate the index file using the recommend method (generating a file of all the files to be indexed, then running Iindex on this file), then move it over the old one so as to minimize downtime of the indexing to a fraction of a second. The last lines send an email to ccp14@dl.ac.uk confirming the script has run and the time completed.
#!/bin/csh # You should CHANGE THE NEXT 3 LINES to suit your local setup setenv LOGDIR ./web_area/mirrorbin/logs # directory for storing logs setenv PROGDIR ./web_area/mirrorbin # location of executable setenv PUTDIR ./web_area/web_live/ccp # relative directory for mirroring # relative due to possible kludge in wget #can change to absolute if you wish - some internal links may not work set DATE=(`date`) sed "/START_Iindex/s/NOT_FINISHED/Regeneration_STARTED $DATE/" ./report-template.txt > ./report.txt.new mv report.txt.new report.txt # Some strange things have been happening that have been accumulating Iindex jobs - thus make # sure they are dead. Lachlan 3rd June 1999 /etc/killall -9 swish-e wait # CCP14 ONLY SEARCH /usr/local/bin/swish-e -e -c /web_disc/ccp14/search-databases/config.ccp14only > \ /web_disc/ccp14/search-databases/config.ccp14only.log mv /web_disc/ccp14/search-databases/swish.index.ccp14only.new \ /web_disc/ccp14/search-databases/swish.index.ccp14only #wait # 2>&1 - puts standard err to the file as well. set DATE=(`date`) sed "/CCP14only_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt # CCP14 ALL CRYSTALLOGRAPHIC PAGES /usr/local/bin/swish-e -e -c /web_disc/ccp14/search-databases/config.ccp14all > \ /web_disc/ccp14/search-databases/config.ccp14all.log mv /web_disc/ccp14/search-databases/swish.index.ccp14all.new \ /web_disc/ccp14/search-databases/swish.index.ccp14all set DATE=(`date`) sed "/CCP14ALL_index/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt # ALIFE SEARCH /usr/local/bin/swish-e -e -c /web_disc/ccp14/search-databases/config.alife > \ /web_disc/ccp14/search-databases/config.alife.log mv /web_disc/ccp14/search-databases/swish.index.alife.new \ /web_disc/ccp14/search-databases/swish.index.alife set DATE=(`date`) sed "/ALIFE__Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt # PROGRAMMING SEARCH /usr/local/bin/swish-e -e -c /web_disc/ccp14/search-databases/config.programming > \ /web_disc/ccp14/search-databases/config.programming.log mv /web_disc/ccp14/search-databases/swish.index.programming.new \ /web_disc/ccp14/search-databases/swish.index.programming set DATE=(`date`) sed "/PROGRAMMING_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new mv report.txt.new report.txt # set DATE=(`date`) # sed "/NETLIB_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt > report.txt.new # mv report.txt.new report.txt # Some strange things have been happening that have been accumulating Iindex jobs - thus make # sure they are dead. Lachlan 3rd June 1999 /etc/killall -9 swish-e wait /usr/sbin/Mail -s "Isite_Isearch_Creation_Results `date`" ccp14@dl.ac.uk < ./report.txt |
Creating Default HTML Forms FilesCopy and Edit seems to be the name of the game here as per the following:
|