dotCMS Site Search utilizes ElasticSearchwhich allows the creation of independent indexes for different search sections on your website. Each of the indexing processes can be scheduled to run at different times, and results can be indexed in common or separate indices.
- This is an Enterprise Edition feature only. If you are using the Community Edition, click on “Try Enterprise Now” link your dotCMS installation to preview/purchase the dotCMS Enterprise Edition.
- These indices are separate from the content search index used in previous versions.
What is included in a Site Search Index
The Site Search queries all the Pages, Files and Content to create the index. The search does not crawl your website. Instead it searches the content repositoryon the given host and itserializes the content to make itsearchable. The following limitations apply to Site Search indices:
- Only content types with a URLMap are added to the site search index.
- Site Search results are NOTpermissioned. The Site Search will only index pages, files or URL Mapped content with theCMS ANONYMOUS read permission. For this reason, the Site Searchcannot be used to index non-public pages (pages requiring a site login to access).
DotCMS provides the SiteSearch viewtool that allows for searching on the indices using ElasticSearch queries.
To create a new Site Search Index selectSystem→ Site Search.
This page shows you fivetabs: Indices, Search, View all Jobs, Job Scheduler and Job Audit Data.
On the Indicestab, you can click to create a new index, adding an Alias or Index Name, and configure the number of shards to be used.
Once an Index has been created you can perform the same actions that are available on the Content Indices:
- Update number of replicas
- Restore Index
- Download Index
- Remove Default Status or Make Default
- Close Index (only for non-default indices)
- Delete Index (only for non-default indices)
To run or schedule an indexing process click on the Job Scheduler tab.
On this page a new job can be either run now or scheduled. Enter the following fields to create a job:
Name: This is the job’s name
Hosts: Select the hosts to index, only live hosts can be indexed. All hosts can be checked to index them all. Note: If you want to include Content Types that don't have a Host field you will need to check “Index all Hosts” or else only content with the given host field will be added to the index.
Index name: Select from the drop down the index where the content will be added.
Include changes from: If Incremental is checked only changes made to content since the last index will be added to the index (including any new content and any edits to existing content).
Incremental indexing will only remove content from the index if that content has been unpublished or archived.
If you have deleted content from your site and you select Incremental, the deleted content will not be removed from the index.
- You must perform a full reindex to ensure deleted content is removed from the index.
Language: List of all languages created in the system. Check the desired content language(s) to include in the index.
Paths: Comma separated list of paths to include or exclude. Paths can contain wildcards, e.g. /home/*
Cron Expression: Defines when the job will run. Some examples are included.
Once all the fields have been filled click on “Schedule” to create the job.
All scheduled jobs can be found on the “View All Jobs” tab.
If a Job was created to run now, it will show up on the listing as: runningOnce. When the job finishes it will disappear from the list.
To test a search on an existing index, click on the Search tab.
On this page, a query can be made on the available indices. The search results display the Score, Page Title, Author, Content Length, Url, Uri, Mime Type and Filename.
Click here for more information on possible queries to use on this page.
The last Tab “Job Audit Data” shows job information for every job execution. It contains a filtering select that lists the jobs whose cron expressions are still pending for execution (at least one). When a job from the list is selected and the load button is clicked, all the job information is displayed for each job execution (i.e: for a job who runs on a daily-basis, it will show a row per day, and a new row will be shown everyday).
Note: If you require a specific behaviour for when the Site Search is querying and reading your page you can use the request header “User-agent“. The “User-Agent” will be equals to “DOTCMS-SITESEARCH” when the Site Search it is accessing your page.
You can access that “User-Agent” header from your velocity page code. Here is a way to EXCLUDE a portion of code from being included in your SITE SEARCH index: