Managing Site Indexes with ElasticSearch

Last Updated: Dec 18, 2023
documentation for the dotCMS Content Management System

The Index tab in the System -> Maintenance tool allows you to manage indexes for single or clustered dotCMS instances.

Screenshot of Maintenance's index pane.

Index Details

The detail area displays the following information for each index:

FieldDescription
StatusStatus of the index (which working/live indexes are active).
Alias NameIdentifier of the index, and whether the index is live or working.
CreatedCreation timestamp of the index.
CountNumber of objects in the index.
ShardsNumber of Elasticsearch shards (underlying sub-indexes) in the index.
Replicas*Number of copies of a given index (*only available on clustered instances).
SizeSize of the index (in Megabytes).
HealthColored icon indicating whether the index or index “replica” is being used by a dotCMS instance:
  1. Green: Healthy index allocated to a dotCMS instance.
  2. Yellow: Not allocated or not enough dotCMS instances/cluster nodes available for replication.
  3. Red: Index cannot be written to because the primary shard of the index can't be found.

Elasticsearch Shards

When you create a new index, you may specify the number of shards in the index. Elasticsearch abstracts the index so that several “shards” (sub-indexes) can aggregate results when a query is performed. This makes multiple separate shards behave like a single index, but enhances performance and scaleability because ElasticSearch only updates the shard(s) that it needs instead of updating the whole index every time.

Shard Performance

Shards provide a trade-off in performance. As the number of shards increases the process of updating the index gets faster, but performing queries against the index may become slightly slower (as multiple shards may need to be accessed to perform the query, especially if it's a complex query).

General Considerations:

  • If you have a site with a large database and/or frequent content updates, you may want to consider increasing the number of shards to reduce the time it takes to re-index content.
  • If you have a site with a great deal of front-end traffic, you may want to minimize the number of shards to maximize query performance.
  • Achieving the right balance requires a needs assessment and follow up testing.

Actions Available on Indexes

The following right-click options are available on any index:

ActionIndex StatusDescription
Clear IndexActive / InactiveClears the index (to prepare for a rebuild).
Dectivate IndexActiveDeactivates an index. Index retains system resources and remains available for actions such as clearing.
Activate IndexInactiveReactivates a deactivated index.
Close-IndexInactiveCloses an index, blocking read/write operations and caching and freeing up system resources, so it has nearly no overhead on the server.
Open-IndexClosedRe-opens a closed index, but leaving it in a deactivated state.
Delete IndexInactive / ClosedRemoves an index from the server; this cannot be reversed.

Right-click context menu on an inactive index.

Note: Clearing or de-activating a live index will display a popup warning message that site visibility may be affected.

  • However when you are troubleshooting potential index issues, these actions allow you to “clean” an index before restoring a copy of that index; this provides a much faster and lighter option than a complete site re-index to test or resolve an issue.

Adding an Index

To build an index, select a content option from the dropdown appearing at the top of the Content Index Tasks subsection and click the Reindex button. The dropdown defaults to Rebuild Whole Index, which both includes all content, and guarantees the creation of new indexes. Selecting any other choice will instead perform reindexing within the currently active index; this process is conducted in the background, updating the index's count and size after completing.

Dropdown and Reindex button for building an index.

You will be prompted to indicate the desired number of shards, and then the process will begin.

Screenshot of shard count prompt.

Important Note: Indexes do take up memory space. Unused/old indexes should be removed using the “Delete Index” right-click option.

Index Location

Unless configured otherwise, index shards are stored in the /dotsecure/esdata/ directory from the root of dotCMS (e.g. /dotserver/tomcat-X.x.xx/webapps/ROOT/dotsecure/esdata in the default dotCMS distribution).

However Elasticsearch allows you to store each shard in a different location, enabling you to distribute shards in separate folders or on separate disks if desired.

To configure a different location for your Elasticsearch indexes, you must modify the DYNAMIC_CONTENT_PATH configuration property.

Note: It is strongly recommended that all changes to configuration properties be made through environment variables.

Index Replicas (For Cluster Implementations Only)

Special Note: By default, the Auto-clustering feature handles the replicas for Elasticsearch. The default configuration setting ES_INDEX_REPLICAS=autowire. The AUTOWIRE_CLUSTER_ES is set to true as well. The ES_INDEX_REPLICAS property may also be set to 0-all (or any other boundary. 0-8, etc). The lower to upper integer boundary can auto-expand/auto-contract the number of index replicas between the lower & upper boundary based on the ElasticSearch cluster topology. This will enable automated management of ES replicas as long as an dotCMS Enterprise license is also present on the server.

However, setting the ES_INDEX_REPLICAS property to a specific integer, enables backend UI manual management of clustered elastic search replicas as long as the AUTOWIRE_CLUSTER_ES is set to false. This fixes a static number of ES replicas.

For more information on how configure the management of replicas, please see the Auto Clustering Configuration documentation

When using a cluster, you can create replicas of an index to distribute and mirror the index across multiple servers in the cluster. To change the number of replicas of an index, right-click on the index and select Update Number of Replicas.

Managing Indexes via the API

Index management actions that can be performed from the dotCMS backend can also be achieved using the REST API (including via CURL commands). For more information, see the RESTful API to Manage Indexes documentation.


Note on Number of Replicas

For more information on how configure the management of replicas, please see the Auto Clustering Configuration documentation

Setting the proper number of replicas for dotCMS's ElasticSearch index can be confusing. It is important to understand that the number of replicas does not equal the number of servers in your cluster.

The number of replicas is how many times you want your index to be copied. For example, if you are running a two node cluster then you should have your ElasticSearch replicas set to “1”. This means that there is the original index entry on one server and 1 replica, or copy, on the other server, so both servers have a copy of all index entries.

Therefore, a guideline for the proper number of replicas is:

  • For clusters with less than 5 servers: Set replicas to one less than the number of servers in the cluster. Examples:
    • 2 servers: 1 replica
    • 4 servers: 3 replicas
  • For clusters with 5 or more servers: Set replicas to 1/2 the number of servers in the cluster (rounded up).
    • 5 servers: 3 replicas
    • 8 servers: 4 replicas

When a new server is joined to the cluster (see cluster doc), Elasticsearch automatically recognizes the server and begins replicating to it. When a server is removed from the cluster or goes offline, the replicated index may display a “yellow” or inactive status if the number of nodes configured in the cluster does not at least match the number of replicated indexes.

The Elasticsearch index may also be configured to run as a “stand-alone” Elasticsearch server and connect to each node in a dotCMS cluster.

On this page

×

We Dig Feedback

Selected excerpt:

×