Troubleshooting Performance documentation for the dotCMS Content Management System

Step 1: Validate/Optimize Configuration

Tomcat

Xms, Xmx

Your Java heap settings and Xmx are the single most important parameter when tuning dotCMS. For best results, set your Xms value equal to your Xmx value.

General rule of thumb, the dotCMS process should be set up to take 50-70% of your server's memory, up to about 23GB of heap. After that, GC and GC fragmentation can slow servers down.

Tomcat Threads

Database

DB Connections

Your database should be set up to allow up to 1 connection per tomcat thread, e.g. 3 dotCMS servers with 200 tomcat threads each should connect to a DB server that will allow up to ~600 db connections. This is just as a precaution, as dotCMS should not use this many db connections.

DB Connection Timeouts

dotCMS performs long running transactions when importing or push-publishing a large number of content objects (or a whole site) in a single bundle. If your database is configured to automatically time out connections, the timeout value should be longer than your expected longest running import transaction.

Cache

How dotCMS caches objects

It is important to understand how dotCMS caching works. dotCMS uses “CacheProviders” to minimize db and elasticsearch queries that can otherwise over-burden your database or elasticsearch instances. These CacheProviders offer a variety of strategies to store dotCMS objects - fixed size in memory caches, timed in-memory caches, on disk caches and remotely networked caches. Each CacheProviders takes specific configurations. Everything in dotCMS’s front end is cached - when a dotCMS is running with a fully loaded cache, it is possible to serve pages and content even when there has been a database outage.

dotCMS caches objects in regions - based on the type of object being cached. These caches are populated lazily, only when an object has been retrieved from the source db, and each dotCMS instance in a cluster is responsible for its own cache (unless you are using a networked cache such as Redis). Each cache region can be configured as a chain of CacheProviders. This cache chain means each CacheProvider for a given region is queried sequentially for a valid cached entry. Your cache chain should be ordered from fastest to slowest - the fastest CacheProvider being queried first, the slowest last. If a valid cache entry is not found, dotCMS will do a database hit and place that newly refreshed object into all the providers for that cache region. When an object in dotCMS is updated - say a piece of content has been edited - dotCMS will invalidate that object in all relevant cache regions and providers. dotCMS will also send an invalidation message to other dotCMS servers in the cluster to invalidate the object from their caches as well. This invalidation forces the next hit for the object from cache to be refreshed from the database.

To learn more about the different caching providers available in dotCMS, see: Cache Configuration

Cache Tuning

dotCMS ships with a cache tuned for a small/mid sized site. Most sites will want to adjust their cache sizings and providers for the various regions, depending on their dotCMS usage patterns.

Most cached objects are small and are <1k in size. There are a few regions which are heavier and hold larger objects which can take up significantly more java heap memory. Generally, these regions should be sized conservatively according to your configured heap size and backed by an off-heap, e.g. on-disk or networked CacheProvider, like H22 or Redis to prevent unneeded database lookups. These heavy regions can include:

  • Contentlet Cache ~ 15k per object
  • Velocity Cache ~ 100k per object
  • Block Cache / HTML Page Cache ~100k per object
  • Css Cache

When tuning your dotCMS in memory cache, e.g. CaffeineCache, you should start by letting your site run for a while until the cache has had a chance to warm up. (Pro Tip - You can fire a one time site search job that indexes all sites and content to help fill up various regions of cache). Once your implementation is warm, head to the System > Maintenance > Cache screen and show your cache regions, sorting by evictions desc. You are looking at 3 numbers, the load, the hit rate and the number of evictions. If there are no evictions, then that region is able to hold all relevant objects in memory and there is no tuning needed. If you see evictions, take a look at the load and the hit rate. If the load is high and the hit rate is low and you have heap memory to spare, you can probably increase the size of that region so that the evictions will stop. This will generally result in fewer db lookups and better performance. If the region holds large objects, you should probably think about backing it with an off-heap CacheProvider.

Bottom line - how much you increase the size of the region is not based on a formula - it is trial and error and depends on factors such as

  • Is this a Production only instance or is authoring taking place in the instance as well?
  • Is your implementation api driven or page driven?
  • How many unique urls do you have?
  • What is the average size of the objects in the region?
  • How much java memory is available?
  • Is the region backed by another 2nd tier CacheProvider such as H22 or Redis?

While the above should give you a start on how to tune dotCMS caches, only by performance testing various configurations can you arrive at the optimal configuration for your implementation.

Elasticsearch Tips

  • Minimize the number of indexed fields. From a performance perspective, marking a field as indexed is not “free” and only fields that you want to search by should be marked as indexed. This is because each field marked as indexed in dotCMS adds overhead to the reindexing process as well as to the amount of memory needed by Elasticsearch’s field cache to make that field searchable. In fact, in order to maintain performance, Elasticsearch has a 1000 field limit by default, though this can be changed by configuration (google: elasticsearch mapping explosions). In order to offer maximum flexibility, dotCMS raises the 1000 field limit by a good bit but it is important to keep the original fact in mind - only index fields if you know you will need to search by them.

  • Delete unused indexes (each open index takes server memory).

  • Keep in mind that an elasticsearch query is like a db query where each query adds a (small) amount of load and latency. A page that makes 10s or 100s of elasticsearch queries unnecessary creates large loads and can overburden your dotCMS or elasticsearch instances. dotCMS caches the results of all elasticsearch content queries but this cache is invalidated whenever a content object is saved or updated.

  • Size and tune your elasticsearch cluster appropriately - there are numerous articles on how to do this online but if your elasticsearch instance is too small it can negatively affect performance.

Velocity

  • Velocity has been shown to be one of the most flexible and performant scripting languages for java that allows for templates and context to be defined at runtime. dotCMS has customized a number of tools in velocity which can help generate dynamic, integrated and personalized experiences though it is important to know how to best use them to insure performant content delivery.

  • Learn about the #dotCache directive and how to use it as a block cache to wrap remote calls or expensive queries. The #dotCache directive should be used to wrap any remote calls, such as those that use the $json tool as well as any heavy/complex content queries which are run consistently, e.g. when rendering a sites’ header or footer. Wrapping these calls in a #dotCache block will give them the performance of static content, once the cache block has been loaded.

  • Prefer javascript/client side connections for JSON, XML or injecting html. While Velocity’s remote $json, $xml and $import tooling are very convenient, their uncached use can be perilous for a heavily trafficked site (see point #1). We have seen numerous cases where a slow response from a remote json endpoint takes down a dotCMS server as the requests and server threads back up waiting for a response from the remote endpoint. A page is only going to be as reliable and performant as the sum of all the remote calls it makes.

  • Use page cache for highly trafficked pages. Pages served from cache are 10x more performant than pages generated dynamically. If you have a page or pages that are hit often, always set as large a page cache as can be expected for the business. For example, if your site is a news site, your home page can be expected to be news and content heavy, which can generate a large load. By setting your homepage cache to say 3m, you will greatly reduce the load of generating that page dynamically for every request. And content can still be brought in dynamically via api or javascript calls.

  • Paginate results server side. If you are expecting a large quantity of content from a content pull, they should be paginated server side so as not to overburden your server.

Images and Static Assets

Image Transformation / Manipulation

If you are running in cluster and are doing a large amount of image transformations, you should consider relocating the /assets/dotGenerated folder to a folder that is local on your instance. You can easily do this by replacing that folder with a symlink from your /assets/dotGenerated to a folder under dotsecure, e.g./dotSecure/dotGenerated`

Use a CDN to cache assets/pages and api calls

Step 2 - Check Memory Sizing and Garbage Collection

If you are experiencing performance issues the first thing to do is to insure that you java server’s -Xmx memory is sufficient. The memory usage can be seen on the System > Maintenance > Cache screen. Refresh the stats a few times. On a normally loaded site, you should see the “Filled Memory” creep up to the “Total Memory Available” and then, as garbage collection (GC) kicks in, return to a much lower baseline value. This process is normal and will repeat as pages and content are served from dotCMS, where Java objects that are used to populate pages and content fill up the heap’s memory, then get garbage collected or reclaimed after they are no longer referenced.

If the “Filled Memory” remains close to the “Total Memory Available”, even after refreshing the screen over the course of a few minutes, your dotCMS might be experiencing memory pressures which can lead to degraded performance, Full Garbage collections, and outages.

Full Garbage Collection, aka Full GC

Full GC stops ALL the java threads and processes for up to a few seconds to try to remove unreferenced objects from memory. If your dotCMS is in Full GC, all dotCMS, requests, responses will pause until the Full GC process finishes. Full GCs cause a site outage for a few seconds. Bottom line- Full GCs are bad and can mean your dotCMS does not have enough heap memory (-Xmx) to be able to fulfill all requests. Full GCs can also indicate that your dotCMS is heading for a terminal “Out of Memory” error, which will take your dotCMS instance down. Generally, a properly sized dotCMS server should have enough heap memory to be able to run without ever executing a Full GC. In order to see if your server is hitting Full GC, you should turn on GC Logging. You can do this by adding a flag to the java command that runs dotCMS, e.g. -XX:+PrintGC or -XX:+PrintGCDetails for more information. GC logging gives you insight into how much time your server spends in garbage collection and can help you identify if your server is hitting Full GC.

If your cache is warmed up and your site has been loaded for a while, take a note of the baseline value of the “Filled Memory” after garbage gets collected - that roughly represents the size of the java process and includes the resident objects that dotCMS has stored in memory cache. If there is a large delta between the baseline “Filled Memory” and the “Total Memory Available”, this means that you have room to increase the size of some of the in memory cache regions that are experiencing evictions.

Step 3 - Take Thread Dumps

If the server is experiencing slowness and has plenty of memory, the second step to figuring out why is by taking thread dumps. Thread dumps can be taken from the System > Maintenance > Threads tool or from a command line. To take a threaddump using jstack, you need the java process id, e.g. jstack -l <pid>. You can also run kill -3 JAVA_PID or killall -3 java which will print a thread dump into java's standard out - generally catalina.out. To use thread dumps effectively, you should take a multiple thread dumps over a period of time, say one every ten seconds for a minute. This is because a thread dump is a snapshot of the processes running in java and we are looking for trends or patterns - common stack traces - that show up in multiple thread dumps to indicate a process is taking up processing time.

1. Look for dotCMS threads

A thread dump is made up of the stack traces of each thread dotCMS/java is running. Most of these threads are waiting for work and can be ignored, e.g. tomcat threads, quartz threads, elasticsearch threads, etc. What is important are threads that marked as Runnable, Blocked or Locked and that have com.dotcms or com.dotmarketing class names in the stacks. These threads indicate that dotCMS is doing work in them and they help uncover the underlying cause of performace issues.

2. Look for many threads that are stuck in similar places/states over time.

If you see multiple threads in the thread dump that are stuck in the same place, e.g. the top line of their stack is the same, that is an indication of a performce bottleneck. For example, you might see a number of threads stuck doing I/O work, e.g. java.io or java.nio. You can see this happen if the disk or NFS share is too slow to serve all the disk reads needed to serve files, pages and content. Another example is that you might see a number of threads stuck calling HttpClient such as the com.dotcms.rendering.velocity.viewtools.JSONTool - this is an indication that your site is making external http requests that are not responding in a timely manner and causing threads to block. If you see the same thread stacks in multiple thread dumps taken over time, this is a good indication that those threads and the underlying systems/implementaion code involved should be examined and optimizied (increase disk IOPS, wrap velocity code in a cache, offload custom work to another system).

Step 4 - Install an APM/Glowroot

If after reviewing thread dumps, nothing stands out, the next recommendation is to install an Application Performance Monitoring system such as New Relic or the opensource Glowroot. These systems can give deep insight into performance issues that can plague large platforms such as dotCMS and help identify code bottlenecks. The installation and useage of such systems is outside of the scope of this document, but they can be very helpful in identifying java memory and performance issues.