dotCMS Site Search utilizes ElasticSearch which allows the creation of independent indexes for different search sections on your website. Each of the indexing processes can be scheduled to run at different times, and results can be indexed in common or separate indices.
- This is an Enterprise Edition feature only. If you are using the Community Edition, click on "Try Enterprise Now" link your dotCMS installation to preview/purchase the dotCMS Enterprise Edition.
- These indices are separate from the content search index used in previous versions.
What is included in a Site Search Index
The Site Search queries all the Pages, Files and Content to create the index. The search does not crawl your website. Instead it searches the content repository on the given host and it serializes the content to make it searchable. The following limitations apply to Site Search indices:
- Only content types with a URLMap are added to the site search index.
- Site Search results are NOT permissioned. The Site Search will only index pages, files or URL Mapped content with the CMS ANONYMOUS read permission. For this reason, the Site Search cannot be used to index non-public pages (pages requiring a site login to access).
DotCMS provides the SiteSearch viewtool that allows for searching on the indices using ElasticSearch queries.
To create a new Site Search Index select System → Site Search.
This page shows you five tabs: Indices, Search, View all Jobs, Job Scheduler and Job Audit Data.
On the Indices tab, you can click to create a new index, adding an Alias or Index Name, and configure the number of shards to be used.
Once an Index has been created you can perform the same actions that are available on the Content Indices:
- Update number of replicas
- Restore Index
- Download Index
- Remove Default Status or Make Default
- Close Index (only for non-default indices)
- Delete Index (only for non-default indices)
To run or schedule an indexing process click on the Job Scheduler tab.
On this page a new job can be either run now or scheduled. Enter the following fields to create a job:
- Name: This is the job’s name
- Hosts: Select the hosts to index, only live hosts can be indexed. All hosts can be checked to index them all. Note: If you want to include Content Types that don't have a Host field you will need to check "Index all Hosts" or else only content with the given host field will be added to the index.
- Index name: Select from the drop down the index where the content will be added.
- Include changes from: If Incremental is checked only changes made to content since the last index will be added to the index (including any new content and any edits to existing content).
- Incremental indexing will only remove content from the index if that content has been unpublished or archived.
- If you have deleted content from your site and you select Incremental, the deleted content will not be removed from the index.
- You must perform a full reindex to ensure deleted content is removed from the index.
- Language: List of all languages created in the system. Check the desired content language(s) to include in the index.
- Paths: Comma separated list of paths to include or exclude. Paths can contain wildcards, e.g. /home/*
- Cron Expression: Defines when the job will run. Some examples are included.
Once all the fields have been filled click on "Schedule" to create the job.
All scheduled jobs can be found on the “View All Jobs” tab.
If a Job was created to run now, it will show up on the listing as: runningOnce. When the job finishes it will disappear from the list.
To test a search on an existing index, click on the Search tab.
On this page, a query can be made on the available indices. The search results display the Score, Page Title, Author, Content Length, Url, Uri, Mime Type and Filename.
Click here for more information on possible queries to use on this page.
Note: If you require a specific behaviour for when the Site Search is querying and reading your page you can use the request header "User-agent". The "User-Agent" will be equals to "DOTCMS-SITESEARCH" when the Site Search it is accessing your page.
You can access that "User-Agent" header from your velocity page code. Here is a way to EXCLUDE a portion of code from being included in your SITE SEARCH index: