Managing Site Indexes with ElasticSearch - Documentation topics on: cluster,elasticsearch,elasticsearch index,index,indexes,maintenance,replicas,shard,.

Managing Site Indexes with ElasticSearch

Index management actions that can be performed from the dotCMS backend can also be achieved using CURL commands.  For more information, see the RESTful API to Manage Indexes documentation

The Index tab under System -> Maintenance, allows webmasters to manage indexes for single or clustered dotCMS instances.

The detail area displays:

  • Status - displays which working/live indexes are active
  • Index Name - shows whether the index is live or working and the identifier of the index
  • Created - Shows the creation timestamp of the index
  • Count - Shows how many objects currently exist in the index
  • Shards - How many underlying sub-indexes exist in a given index
  • *Replicas - The number of copies of a given index (*only available on clustered instances)
  • Size - The size of the index in Megabytes
  • Health - Shows whether the index or index “replica” is being used by a dotCMS instance.
    1. Green - healthy index allocated to a dotCMS instance
    2. Yellow - not allocated or not enough dotCMS instances/cluster nodes available for replication
    3. Red - index cannot be written to because the primary shard of the index can't be found

What is a Shard?

Shards - ElasticSearch abstracts away the index so that several “shards” (indexes) can aggregate sharded results when a query is done making it act like a single index but indexation performance is enhanced because ElasticSearch only updates the shard(s) that it needs instead of updating the whole index every time. This allows for speedier indexation and scalability.

*Important Note: Remember that as the number of shards goes up the indexation gets faster, however, queries become slightly less performant. As a general rule, clients with larger databases may want to consider more sharding to lessen re-index lag whereas clients with lots of front-end traffic, but a lighter amount of content may want fewer shards. Achieving the right balance requires a needs assessment and follow up testing.

Right-click Actions Available on any Index

The number of shards can be specified upon creating a new index.

The following right-click options are available on any index:

  • *Update Number of Replicas - changes the number of replicated indexes (*only clustered instances)
  • Restore Index Snapshot - allows the upload of previous index snapshot
  • Download Index Snapshot - downloads a copy of the selected index
  • Deactivate Index (available on active indexes) - disables writing to the index
  • Close-Index - closing an index blocks it for read/write operations so it has nearly no overhead on the server
  • Open-Index - re-opening a closed index perform a normal index recovery process
  • Delete Index (available in inactive indexes) - removes the index
  • Clear Index - cleans the index to prepare for a restore

Clearing or de-activating a live index will display a popup warning message that site visibility may be affected, however, in the case of troubleshooting, these options provide the flexibility to “clean” an index before restoring a copy of that index. This is a much faster and lighter resolution method than a complete site re-index.

Adding a Sharded Index

On click of the Add-Index button a new working/live index can be added and the number of shards for that index can be defined.  

For example:

Step 1) On click of the “Add-Index” button a new live/working index can be created

Step 2) When a new index is created the number of shards for that index can then be specified.

Step 3) The current live/working index can then be backed up with the “Download Index” and restored (“Restore Index”), into the new index that was created in step 1 using the right-click options shown below.

Important Note: Indexes do take up memory space. Unused/old indexes should be removed using the “Delete Index” right-click option.

Where do Sharded Indexes live?

Unless configred otherwise, index shards are stored in the /dotsecure/esdata/ directory from the root of dotCMS. One of the nice features of elastic search is that each shard can potentially be stored on a separate disk.

To configure a different location for the ElasticSearch indexes, you need to modify the property: DYNAMIC_CONTENT_PATH in your file (with a plugin, of course). 

Index Replicas (For Cluster Implementations Only)

Right-clicking on any index provides the option to “Update Number of Replicas” of that index.

Setting the proper number of replicas for dotCMS's ElasticSearch index can be confusing.
It is important to understand that the number of replicas does not equal the number of servers in your cluster - the number of replicas is how many times you want your index to be copied. For example, if you are running a two node cluster then you should have your ElasticSearch replicas set to “1”. This means that there is the original index entry on one server and “1” replica, or copy, on the other server, so both servers have a copy of all index entries.

The “rule of thumb” for the proper number of replicas is n-1 (n = number of servers in the cluster) where there are < 5 servers and ceiling(n /2) for clusters >= 5 servers.

For example:

2 servers = 1 replicas
4 servers = 3 replicas
7 servers = 4 replicas

When a new server is joined to the cluster (see cluster doc), ElasticSearch will recognize the server and begin replicating to it. When a server is taken down, the replicated index may display a “yellow” or inactive status if the number of nodes configured in the cluster does not at least match the number of replicated indexes.

The ElasticSearch index may even be configured to run as a “stand-alone” ElasticSearch server and connected to each node in a dotCMS cluster.