|
| Search
Features — International Languages |
|
|
| Unicode
Support |
 |
Unicode
support allows for indexing and searching of
non-English text, including every character set
supported by the Unicode standard. |
 |
In
addition to Unicode support, dtSearch offers
extensive alphabet customization options. |
 |
See Unicode
FAQ for more technical information. |
|
|
| Language
Extension Packs |
 |
The
dtSearch product line includes an English noise
word list and stemming rules (to find words such
as learn, learned, learns, learning, etc.
that are linguistically related). |
 |
dtSearch's
UK distributor offers pre-packaged sets of noise
word lists and stemming rules covering a wide
variety of European languages. Language
Extension Packs |
 |
The
Western European group includes (in addition
to English): Danish, Dutch, Finnish, French,
German, Italian, Norwegian, Portuguese, Spanish
and Swedish. |
 |
The
Eastern European group includes: Belarusian, Bosnian, Bulgarian, Croatian, Czech, Estonian, Greek, Hungarian, Latvian, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Turkish, Ukrainian. Cyrillic
article |
 |
Licensing:
dtSearch Corp. can add either the Western
European group or the Eastern European group
onto a signed dtSearch developer license.
Please Contact
dtSearch for details. Both packages
may also be licensed directly from www.dtsearch.co.uk. |
 |
More
information on the Language
Extension Packs |
 |
Request a trial version |
 |
Visit
distributor's site in English, Français, Deutsch |
|
| Chinese, Japanese and Korean Text With No Word Breaks |
 |
Some Chinese, Japanese, and Korean text does not include word breaks. Instead, the text appears as lines of characters with no spaces between the words. |
 |
Because there are no spaces separating the words on each line, dtSearch sees each line of text as a single long word. |
 |
To make this type of text searchable, enable automatic insertion of word breaks around Chinese, Japanese, and Korean characters, so each character will be treated as single word. |
 |
dtSearch Desktop/Network: In Options > Preferences > Letters and Words, check the box to “Insert word breaks between Chinese, Japanese, and Korean characters in text.” |
 |
dtSearch Developer API: set dtsoTfAutoBreakCJK in Options.TextFlags. |
| Language
Analyzer API Integration |
 |
The dtSearch Engine includes a language
analyzer API that can be used to integrate
morphological analyzers and custom or dictionary-based
word breakers into the dtSearch Engine indexing
process. |
 |
The
dtSearch Engine offers integration
with Basis Technology's Rosette Linguistics
Platform for enhanced Chinese, Japanese and
Korean text retrieval. |
 |
The
dtSearch Engine also includes an API for substituting
a non-English language thesaurus for the existing
English-language one. |
|
| |
|
| |
|
 |
The
dtSearch product line can instantly search
terabytes of text across a desktop, network,
Internet or Intranet site. |
dtSearch
products also serve as tools for publishing,
with instant text searching, large document
collections to Web sites or CD/DVDs. |
 |
over
two dozen indexed, unindexed, fielded and full-text
search options |
 |
highlights
hits in HTML, XML and PDF, while displaying
embedded links, formatting and images |
 |
converts
other file types — word processor, database,
spreadsheet, email and full-text of email attachments,
ZIP, Unicode, etc. — to HTML for display
with highlighted hits |
 |
built-in Spider adds
a third-party or other Web site (public, secure
content, password accessible, etc.) to your searchable
database |
 |
Spider supports
Web-based content (HTML, PDF, XML, etc.) as well
as dynamically-generated content (ASP.NET, MS CMS,
SharePoint, etc.) |
| General
supported file types |
| SQL
and similar data sources |
|
|
|