Using Unicode support in ASPseek
================================

Since 1.1, ASPseek can store words in Unicode.

If web pages are to be indexed by ASPseek have different incompatible
encodings (example: Cyrillic + European + Arabian), it is wise to turn
on Unicode support.

To use this feature, perform the following steps:

1) Run configure with parameter --enable-unicode

2) Compile and install ASPseek as usual

3) Add or uncomment line "Include ucharset.conf" in "aspseek.conf" and
"searchd.conf"

4) Optionally disable unnecessary encodings or add not included encodings to
"ucharset.conf", see format of Unicode charset files in etc/tables/README

5) Create new database and tables using "tables.sql" and index your pages 
to that database

6) Create either several international search pages with different encodings
or one international search page with encoding "utf-8". In latter case
you will be able to search in any language from the same page.

7) Add hidden variable to search forms:

	<input type="hidden" name="cs" value="CHARSET">

where CHARSET is the character set of search page containing form

Add hidden variable to search form in result template:

	<input type="hidden" name="cs" value="$c">

OR

If initial search form is generated by s.cgi, then add hidden variable to
search form in result template:

	<input type="hidden" name="cs" value="CHARSET">

where CHARSET is the character set of page, generated by s.cgi, which
can be set by adding line to the "top" template section:

<META http-equiv="Content-Type" content="text/html; charset=CHARSET">

8) Add parameter to "cached link" in the template "cached": cs=&c
See s.htm-dist for example.


Format of directives "StopwordFile", "Spell" and "Affix" has been changed
in Unicode version. Optional parameter representing charset of files specified
by those directives can be added. If this parameter is omitted, then charset
specified by directive "LocalCharset" is used.

Directives "CharsetTable" are unnecessary in Unicode version.

Directive "LocalCharset" is unnecessary in Unicode version, but can be useful
in case described above.

Only Unicode version has true support of HTML entities like &ouml; or &#246;.

Note, that database structure of Unicode version is incompatible with
non-Unicode, and you should re-index everything to the new database.
