Metadata Extraction & Indexing

Last Updated: Apr 13, 2022
documentation for the dotCMS Content Management System

You may dynamically extract metadata information from files using Velocity code. For information on how metadata is extracted on upload, or how it displays on file content, please see the File Metadata documentation.

Searching Metadata Fields

The types of metadata that are stored by dotCMS is configurable and intentionally limited by default. For more information, please see the How Content is Mapped to ElasticSearch documentation concerning metadata.

Backend Search

Once you have uploaded your files, the metadata fields and values are stored in the database and also indexed using ElasticSearch. You can search the Metadata fields using the Content Search feature in the dotCMS backend UI:

  1. Go to the Content tab.
  2. Select Type: File Asset.
    • This displays all the file content you have created or uploaded.
  3. Click Advanced on the left side bar.
  4. Enter the information you wish to search for in the Metadata field.

For example, to search for JPG images enter: contentType:image/jpeg. To search for all PDF documents enter: contentType:application/pdf.

File Search

ElasticSearch

You may search in file metadata using ElasticSearch by adding a +metaData term to the query to search the ElasticSearch query:

+contentType:FileAsset +metaData.{'{fieldname}'}:*{'{value}'}*

For example, to search for all JPEG images, you can use the following search term:

+contentType:FileAsset +metaData.contentType:*image/jpeg*

Accessing File Contents

You can also search inside file contents using the content: keyword. This works both in backend Content search and in ElasticSearch queries.

The following example searches for all files that have the terms: footer-nav inside the text of the file from the dotCMS backend:

Searching for Images

The following search terms perform the same search within an ElasticSearch query:

+contentType:FileAsset +metaData.content:*footer-nav*

Image width and height are also stored and searchable in the metadata by default:

Searching for Images

…or again, by a query of the metadata:

+contentType:FileAsset +metaData.width:*4032*

Displaying Metadata Fields and Values

You may access metadata fields from any files retrieved using a content pull in your Widgets and other Velocity code, and you may create ElasticSearch queries using the metadata fields.

Note: The metadata keys and values for each file are stored as a JSON string.

In your code, you can access the metadata in three different ways:

  • Retrieve and process the complete Metadata string (as a JSON string):
    $file.metaData
    
  • Loop through all individual Metadata fields:
    #foreach($field in $file.metaData.entrySet())
        $field.key : $field.value
    #end
    
  • Access individual Metadata fields by name and value:
    $file.metaData.contentType
    $file.metaData.fileSize
    

Example

The following example searches for two most recently modified JPEG image files and displays the metadata in all three ways listed above.

#foreach($file in $dotcontent.pull("+contentType:FileAsset +metaData:image/jpeg",2,"modDate desc"))
    <h2>File: $file.title</h2>
    <h3>MetaData:</h3>
    <p>$file.metaData</p>
    <ul>
    ##Loop through all metadata field key, values
    #foreach($field in $file.metaData.entrySet())
        <li><b>$field.key</b> $field.value</li>
    #end
    </ul>
    <h3>Print each value separately:</h3>
    <p>Content type: $file.metaData.contentType</p>
    <p>File Size: $file.metaData.fileSize</p>
#end

The following image displays the results of a widget using the above code on a sample site:

File Listing Widget

On this page

×

We Dig Feedback

Selected excerpt:

×