Managing Site Indexes with ElasticSearch - Documentation topics on: cluster,elasticsearch,elasticsearch index,index,indexes,maintenance,replicas,shard,.

Managing Site Indexes with ElasticSearch

Index management actions that can be performed from the dotCMS backend can also be achieved using CURL commands.  For more information, see the RESTful API to Manage Indexes documentation

The Index tab in the System -> Maintenance tool allows you to manage indexes for single or clustered dotCMS instances.

Index Details

The detail area displays the following information for each index:

FieldDescription
StatusStatus of the index (which working/live indexes are active).
Index NameIdentifier of the index, and whether the index is live or working.
CreatedCreation timestamp of the index.
CountNumber of objects in the index.
ShardsNumber of Elasticsearch shards (underlying sub-indexes) in the index.
*ReplicasNumber of copies of a given index (*only available on clustered instances).
SizeSize of the index (in Megabytes).
HealthColored icon indicating whether the index or index “replica” is being used by a dotCMS instance:
  1. Green: Healthy index allocated to a dotCMS instance.
  2. Yellow: Not allocated or not enough dotCMS instances/cluster nodes available for replication.
  3. Red: Index cannot be written to because the primary shard of the index can't be found.

Elasticsearch Shards

When you create a new index, you may specify the number of shards in the index. Elasticsearch abstracts the index so that several “shards” (sub-indexes) can aggregate results when a query is performed. This makes multiple separate shards behave like a single index, but enhances performance and scaleability because ElasticSearch only updates the shard(s) that it needs instead of updating the whole index every time.

Shard Performance

Shards provide a trade-off in performance. As the number of shards increases the process of updating the index gets faster, but performing queries against the index may become slightly slower (as multiple shards may need to be accessed to perform the query, especially if it's a complex query).

General Considerations:

  • If you have a site with a large database and/or frequent content updates, you may want to consider increasing the number of shards to reduce the time it takes to re-index content.
  • If you have a site with a great deal of front-end traffic, you may want to minimize the number of shards to maximize query performance.
  • Achieving the right balance requires a needs assessment and follow up testing.

Actions Available on Indexes

The following right-click options are available on any index:

ActionDescription
Restore Index SnapshotReplaces the current index with a previously downloaded index snapshot.
Download Index SnapshotDownloads a copy (snapshot) of the selected index.
Close-IndexCloses an index, blocking it from performing read/write operations (so it has nearly no overhead on the server).
Open-IndexRe-opens a closed index.
Clear IndexClears the index (to prepare for a restore).

The following additional options are available only on clustered instances:

ActionDescription
Update Number of ReplicasChanges the number of replicated indexes.
Deactivate IndexDisables writing to the index.
Delete IndexRemoves the index from the cluster.

Note: Clearing or de-activating a live index will display a popup warning message that site visibility may be affected.

  • However when you are troubleshooting potential index issues, these actions allow you to “clean” an index before restoring a copy of that index; this provides a much faster and lighter option than a complete site re-index to test or resolve an issue.

Adding a Sharded Index

On click of the Add-Index button a new working/live index can be added and the number of shards for that index can be defined.

For example:

Step 1) On click of the “Add-Index” button a new live/working index can be created

Step 2) When a new index is created the number of shards for that index can then be specified.

Step 3) The current live/working index can then be backed up with the “Download Index” and restored (“Restore Index”), into the new index that was created in step 1 using the right-click options shown below.

Important Note: Indexes do take up memory space. Unused/old indexes should be removed using the “Delete Index” right-click option.

Index Location

Unless configured otherwise, index shards are stored in the /dotsecure/esdata/ directory from the root of dotCMS (e.g. /dotserver/tomcat-X.x.xx/webapps/ROOT/dotsecure/esdata in the default dotCMS distribution).

However Elasticsearch allows you to store each shard in a different location, enabling you to distribute shards in separate folders or on separate disks if desired.

To configure a different location for your Elasticsearch indexes, you must modify the DYNAMIC_CONTENT_PATH property in your dotmarketing-config.properties file.

Note: It is strongly recommended that all changes to the dotmarketing-config.properties file be made through a properties file extension.

Index Replicas (For Cluster Implementations Only)

When using a cluster, you can create replicas of an index to distribute and mirror the index across multiple servers in the cluster. To change the number of replicas of an index, right-click on the index and select Update Number of Replicas.

Note:

Setting the proper number of replicas for dotCMS's ElasticSearch index can be confusing. It is important to understand that the number of replicas does not equal the number of servers in your cluster.

The number of replicas is how many times you want your index to be copied. For example, if you are running a two node cluster then you should have your ElasticSearch replicas set to “1”. This means that there is the original index entry on one server and 1 replica, or copy, on the other server, so both servers have a copy of all index entries.

Therefore the “rule of thumb” for the proper number of replicas is:

  • For clusters with less than 5 servers: Set replicas to one less than the number of servers in the cluster. Examples:
    • 2 servers: 1 replica
    • 4 servers: 3 replicas
  • For clusters with 5 or more servers: Set replicas to 1/2 the number of servers in the cluster (rounded up).
    • 5 servers: 3 replicas
    • 8 servers: 4 replicas

When a new server is joined to the cluster (see cluster doc), Elasticsearch automatically recognizes the server and begins replicating to it. When a server is removed from the cluster or goes offline, the replicated index may display a “yellow” or inactive status if the number of nodes configured in the cluster does not at least match the number of replicated indexes.

The Elasticsearch index may also be configured to run as a “stand-alone” Elasticsearch server and connect to each node in a dotCMS cluster.