Is this really the version you are looking for? Kentico 8 documentation is no longer updated. Use the switch in the header to choose your Kentico version.

Creating search indexes

Indexes are the core of the smart search functionality. They store information about the searchable content and define the scope of searches. When a visitor submits a search request, the system looks through the appropriate indexes instead of the actual records in the database. Indexes organize data in a way that is suitable for searching, so the smart search retrieves results faster than linear searches, particularly for large volumes of data.

The following types of search indexes are available:

Index type Description
Documents

Stores information about the pages of websites (documents in the content tree).

Documents crawler Directly indexes the HTML output of documents (pages).
Forums Stores information about the content of discussion forums.
Custom tables Indexes records stored in custom tables.
On-line forms Indexes data that the website's visitors submit through forms.
Users Stores information about users in the system.
General Stores information about system objects of a specified type.
Custom

Allows you to use your own custom‑coded search index. Stores any kind of data depending on the implementation.

Before you can start searching content, you need to create search indexes for your website:

  1. Open the Smart search application.
  2. Click New index.
  3. Fill in the index properties. Most importantly, you need to select the:
    • Index type - determines what type of content the search index stores
    • Analyzer type - determines how the index breaks text into searchable tokens

    Creating a search index

  4. Click Save to create the search index.
    • The General tab of the index's editing interface opens. Here you can edit the same properties that you configured when creating the index.

  5. Open the Sites tab and select the websites where you wish to use the index. You can implement multi-site search functionality by assigning the index to more than one website.

    Note: If the index includes global objects that are not site-specific, the selection made on the Sites tab does not affect the index's content. However, you can only use the index (through Smart search web parts) on the assigned sites.

  6. If you are creating a Documents or Documents crawler type index, switch to the Cultures tab. Here you need to select which language versions of the website's documents are indexed.
    • You must assign at least one culture in order for the index to be functional.
    • If you have a multi-site index, you can select the cultures separately for each site.

     

  7. Switch to the Indexed content tab and define the content covered by the index. The available options depend on the type of the index:

  8. Go back to the General tab and Rebuild the index.
    • The Index info section displays information about the current status and parameters of the index.

    Editing a search index on the General tab

Once the system finishes building the index, you can start using the index on your website.

The Search preview tab allows you to quickly test the functionality of the index. For testing of advanced features, assign the index to a smart search results web part on a real page.

 

 

Maintaining search indexes

You can manage existing search indexes using the actions available on the General tab of the index editing interface.

The system automatically updates search indexes to reflect all changes made to the indexed content. Over time, these updates can make indexes less efficient, particularly in the case of large indexes.

To restore optimal search performance for an index, defragment the index by clicking Optimize. You can enable the Optimize search indexes scheduled task to have the system automatically optimize all smart search indexes once per week.

The Rebuild action deletes the current index file and indexes all specified content again.

  • Use the rebuild action to apply changes made to the index's configuration. This includes modifications of the analyzer settings (Analyzer type, Stop words), all options on the Indexed content, Sites or Cultures tabs, and adjustments of the search field settings for the indexed objects.
  • The system automatically optimizes the index after a successful rebuild.

Clicking the Rebuild action does not always guarantee that the index starts rebuilding immediately. The process may be delayed if another index is already being rebuilt or if the rebuilding tasks are configured to be handled by the scheduler.

Reference - Search index properties

You can configure the following options when creating new search indexes or editing existing indexes on the General tab:

Index property Description

Display name

Name of the index displayed in the administration interface.

Code name

Serves as a unique identifier for the index (used internally in web part property values or the API). You can leave the default (automatic) option to have the system generate a code name based on the display name.

Warning: The system also uses the code name for the physical index file. The fully qualified name of the file must be less than 260 characters long, including the directory path.

Index type

Determines what type of content the search index stores:

  • Custom index - indexes any kind of data depending on the implementation.
  • Custom tables - indexes records in custom tables.
  • Documents - indexes the content of website pages (documents in the content tree).
  • Documents crawler - indexes the HTML output of the website's documents (pages).
  • Forums - indexes the content of forums.
  • General - indexes system objects of a specified type. General indexes allow you to search through any objects within the system.
  • On-line forms - indexes data submitted by the website's visitors through forms.
  • Users - indexes the data of users in the system.

Analyzer type
______________

Sets the type of analyzer that the index uses to tokenize text (divide text into searchable tokens). The analyzer processes both the indexed content and the search expressions entered by users. When running searches using the index, the system returns results for items that have at least one token matching the search expression.

The following analyzers are available:

  • Simple - divides text at non-letter characters (including numbers).
  • Stop - divides text at non-letter characters (including numbers) and excludes all words in the selected Stop words dictionary.
  • White space - divides text at whitespace characters.
  • Standard - divides text based on language grammar (uses stop words, shortcuts, ...). Very efficient for English, but may not produce satisfactory results with other languages.
  • Keyword - returns the entire text stream of indexed data fields as a single token. Useful for structured data fields like zip codes or IDs.
  • Custom - allows you to assign a custom‑written analyzer. This provides a way to perform text tokenization according to your own requirements. You need to specify the names of the assembly and class where the custom analyzer is implemented. See Creating custom smart search analyzers for more information.
  • Subset - creates tokens for all possible substrings in words. Indexes with subset analyzers return results for all words that contain the search term. For example, searching for net matches words such as net, Internet, network, kinetic, etc.
  • Starts with - creates tokens for all prefixes contained in words, including the whole word. Allows searching for all words that start with the search term. For example, searching for test matches words such as test, tests, tester, etc.
  • Simple / Stop words / White space with stemming - divide text using the Simple, Stop or White space analyzer, and then reduce the tokens to word stems. Allows users to find words that have the same basic meaning as the search term, but different inflection (suffixes). Only works for English.

See also: Configuring search assistance features

Stop words

Selects the stop word dictionary for Stop or Standard analyzers.

Stop words (such as 'and', 'or') are excluded from the index content and the analyzer uses them to divide text into tokens.

You can edit the content of the dictionaries or add new ones. The application stores the dictionaries as text files in the ~\App_Data\CMSModules\SmartSearch\_StopWords folder.

Batch size

Sets the maximum amount of records that the system retrieves in a single database query when rebuilding (or creating) the index. This property allows you to optimize indexing performance.

The default value is 10. Increasing the value reduces the amount of queries required for large numbers of records, which may improve performance, but also increases memory consumption.

The optimal value depends on the type (size) of the indexed objects and on the resources available in your hosting environment. When indexing large objects (e.g. documents), it is recommended to set a reasonably small batch size.

Crawler settings

When editing Documents crawler type indexes, you can configure the user account and domain name that the crawler uses to read pages:

Index property Description

User
______________

Sets the user account that the crawler uses to index pages. Reading pages under a user allows the crawler to:

  • Load user-personalized content for the given user
  • Avoid indexing of documents that the user is not allowed to access

If empty, the index uses the default administrator user account.

On websites that use Windows authentication, you need to type the user name (including the Active Directory domain in format domain\username) and password. To guarantee that the crawler indexes under the specified Active Directory user, the covered pages cannot be accessible by public users (i.e. Windows authentication must be required).

Domain

Sets the domain that the crawler uses when indexing sites. Enter the domain name without the protocol, for example: www.domain.com

If empty, the crawler automatically uses the main domain of the site where the indexed documents belong.

For example, you can set a custom domain for web farm servers that do not have access to the main domain.