How content is mapped to Elasticsearch - Documentation topics on: elasticsearch,query,system indexed,.

How content is mapped to Elasticsearch

All content has properties that are indexed and made searchable by ElasticSearch. Some dotCMS properties are shared by all Content Types, and some specific properties are defined in a Content Type when you mark a field as System Indexed.


  • All variable keys and content are converted to lower case before adding them to the ElasticSearch index.
    • So when you create an ElasticSearch query, all field names and values should be written in lower case.
    • For example, if searching for news items tagged with “Singapore”, the query term should not capitalize the tag name, e.g. news.tags:singapore.

Please see the following sections for more information on how dotCMS content is mapped in ElasticSearch:

System Properties

The following properties exist in all dotCMS Content Types. Each of these properties can be accessed (either in the query terms or the sort field) by specifying the property name only (e.g. “title”), without reference to the name of the content type.

titlestringThis is the first text field marked show in listing.
contenttypestringVariable name for the content type.
basetypeintEnumerated value of the base content type. content=1, widget=2, form=3, file=4, page=5.
liveboolTrue if the content is live (published) on your site.
workingboolTrue if the content is a working (unpublished) version.
lockedboolTrue if the content is locked.
deletedboolTrue if the content has been archived.
langaugeidintLanguage id of the content.
identifierstringIdentifier of the content item itself.
conhoststringIdentifier of the host the content is on.
confolderstringIdentifier of the folder the content is in.
parentpathstringPath to the folder the content is in.
pathstringPath/url to the content.
wfcreatedBystringThe userid of the user who created the current workflow.
wfassignstringThe userid of the user who is assigned the current workflow.
wfstepstringThe guid of the workflow step the content is currently in.
wfModDatestringDate the workflow was changed to the workflow step the content is currently in.
pubdatelongDate the content was published, formatted numerically as yyyyMMddHHmmss.
expdatelongDate the content will expire, formatted numerically as yyyyMMddHHmmss.
urlmapstringThe URL map of the content (if any).
categoriesstringThe variable names of the categories the content is assigned to.

Content Type Specific Fields

Indexed and Non-Indexed Fields

Any field of a Content Type which has the System Indexed property set may be accessed from within an Elasticsearch query, and query results may be sorted by these fields.

Fields which are not System Indexed may be accessed from the objects returned in the search results (when using a content pull for example), but may not be included as part of the query or sort field. Attempts to query or sort by non-indexed fields will usually produce no search results.

Field Syntax

You can query any fields added to a Content Type using the pattern {content type variable name}.{field variable name} (for example, news.headline or news.tags).

  • Content Types are referenced within Elasticsearch queries by the variable name of the Content Type.
    • For example, when querying the “Event” Content Type, your query must reference “calendarEvent” (the variable name of the Content Type) rather than “Event” (the display name of the Content Type).
  • Similary, the fields of a Content Type are referenced within Elasticsearch queries by the variable name of the field (not the field Label or Alias Name).
    • For example, the name used to reference the “Start Date” field of the “Event” Content Type is “startDate” (the variable name of the field), not “Start Date” (the Label of the field).
  • Finally, when accessing any field of a Content Type other than a System Property, you must preface the field name with the name of the content type and a period.
    • For example, when accessing the “Start Date” field of the “Event” Content Type, you must reference the field as “calendarEvent.startDate”.
    • If you fail to preface the field variable with the Content Type variable name, the query will not recognize the field (and may return no results).

All references to Content Types and fields must use this syntax, regardless of whether you are using Lucene syntax or full Elasticsearch JSON syntax. And you must use this syntax when specifying the sort field in dotCMS viewtools (such as content pulls and the Elasticsearch Viewtool).

Special Fields

The following fields do not exist in all Content Types, but have special uses within dotCMS. If you wish to use these fields, you must manyally add them to a Content Type; but once you add them they enable the use of additional dotCMS features which rely on the existence of these fields.

tagsstringYou must create a tag field to be able to associate tags with a piece of content.
Note that the tag field is very important for many Personalization features.
latlongstringTo enable geolocation queries on any Content Type, you must create a text field on that content type with the velocity variable name latlong.
This field takes a string value of latitude and longitude separated by a comma (e.g. “42.648899,-71.165497”).

File Metadata

In dotCMS Enterprise edition, file contents and metadata are indexed and searchable via ElasticSearch. Metadata can be queried using the pattern {content type variable name}.metadata.{metadata field}, for example fileasset.metadata.contenttype:image/jpeg.

For more information on searcing file metadata, please see the File Metadata documentation.

Raw Fields and Sorting

Elasticsearch automatically analyzes and indexes every indexed field, which changes the way the content of the field is stored. However for every field that is indexed and analyzed in Elasticsearch, dotCMS adds an additional field of the same name with _dotraw appended to it (for example, news.byline_dotraw). This field stores the “raw” value of the field, which is not analyzed by Elasticsearch.

By default, Elasticsearch cannot sort by fields that have been full text analyzed and indexed; if you attempt to sort by any indexed field, the query will succeed but the results will not be sorted as expected. To sort these fields properly, you must instead sort by the raw version of the field; since the raw field is not analyzed or modified by ElasticSearch, using the raw field allows you to sort any query results as you wish.

Note: Custom Sorting

Elasticsearch provides sophisticated methods to perform custom sorts based on almost any field or combination of fields. To learn more about Elasticsearch custom sorting capabilities, please see the Elasticsearch Sorting documentation.

In addition, you can perform customized sorting by creating custom fields which use Velocity code to construct customized search keys or tokens. This allows you to create relatively sophisticated custom sort capabilities using relatively simple Velocity code, without the need to learn more sophisticated Elasticsearch functionality. For more information on custom fields, please see the Custom Fields documentation.

Content Permissions

dotCMS ensures that a user has appropiate permission to any contents that are returned by ElasticSearch query results; results which a user does not have permissions for are not returned to that user.

Note that this means different users running the same ElasticSearch query may receive different results. Therefore, when troubleshooting queries and query results, make sure to take the user's permissions into account.