Migrating Your Database to PostgreSQL

Last Updated: Apr 5, 2024
documentation for the dotCMS Content Management System

Beginning with dotCMS 23.06, MS SQL Server support has been deprecated; 23.01 LTS will be the last LTS major version to receive active MSSQL support. Similarly, 21.06 LTS, already past end of life by the time of writing, was the last major version to support either MySQL or Oracle databases.

Accordingly, PostgreSQL will soon be the only dotCMS-supported database. This page is intended to assist in dotCMS database migration to PostgreSQL.

Disclaimer: The information in this document are recommendations based on previous migration efforts; while this may suffice for many, unique database problems must be analyzed and resolved according to their specific characteristics and needs.

Requirements

  1. Linux-like server: We will assume the use of Amazon Linux 2.
  2. Resources: Since this server will run dotCMS, Postgres, and a second database, the recommendation is 16 GB of RAM and at least 4 CPUs.
  3. DBeaver 22.2.1+ (DBeaver Community)
  4. Postgres 12+ (15+ preferred)
  5. Docker
  6. Docker images: Postgres, dotCMS, and ElasticSearch

Migration

1. Upgrading to 21.06

If using a version prior to 21.06, we recommend first upgrading your dotCMS instance to dotCMS 21.06.14 LTS, the last patch in the 21.06 LTS series.

This is all the more crucial if you are using MySQL or Oracle; for either of those, 21.06 is the last-supported version. Once upgraded, begin with your dotCMS 21.06 docker instance configured to point to your MySQL or Oracle database.

2. Before Migration, Address Known Issues

a) Remove null characters

It has been found in some MySQL databases that some null characters caused the copy from MySQL to Postgres to crash. So far, the findings have been on the contentlet table and text_area1 field. To verify the presence of null characters, the following query can be run; if 0 results are returned, then the field is ok to be migrated:

select count(*) from contentlet where text_area1 like '%\0%';

In the event that null characters have been found, the issue can be fixed by running the following command before transferring the data to the PostgreSQL database:

UPDATE contentlet AS c1 
    INNER JOIN contentlet AS c2 ON c2.inode = c1.inode 
    SET c2.text_area1 = REPLACE(c1.text_area1,'\0','')  
WHERE c1.text_area1 like '%\0%';

Note: The latter code snippet has been tested on MySQL; it has been syntax-validated for, but not directly tested on, MS SQL Server and Oracle. On Oracle, CHR(0) may also be preferable to \0. This document will expand to reflect successful tests on MSSQL and/or Oracle.

Please keep in mind this fix takes as an assumption based on past findings that the field with the null character is text_area1 on the contentlet table. However, it is not impossible that such may be present on another field and another table. For this reason, it is important to remain mindful of any further errors during the copy process via DBeaver.

b) Drop old asset versions

Certain inconsistencies on the contentlet table can be fixed by deleting old asset versions.

In the Admin Panel, go to System → Maintenance → Tools and locate the Drop Old Assets Versions operation. For the “Remove assets older than” option, use 01/01/2010 as the date, and execute. Then repeat the same step incrementing one year (01/01/2011) and so on, until the present date. Remember to follow the logs for any issues.

As mentioned in the above-linked document, this process will never remove live or working versions of any assets, regardless of their age.

3. Get the Database Schema

First, download 21.06.16-schema-only-no-constraints.sql.

Please be aware that this dump contains NO data and NO constraints. It will serve as a base to facilitate the data import. Constraints will be imported later during this migration.

Note also that no database changes occurred between 21.06.11 and the remaining 21.06.x patches, and so this schema remains fully compatible with the final 21.06 patch.

4. Prepare the Target Database

Next, we'll build a target PostgreSQL database to receive the data.

Create a new database in Postgres with CREATE DATABASE dotcms and restore the dotCMS schema using the schema-only.sql file from the previous sections:

pg_restore -U <db_username> -v -d <database_name> <dump_file>

For example:

pg_restore -U postgres -v -d dotcms 21.06.16-schema-only-no-constraints.sql

5. Copy Data With DBeaver

Install DBeaver and connect to both databases — the legacy DB and the Postgres DB. Once that is done, we can start migrating data from the tables on the former to the same tables on the latter. The below examples will assume a MySQL legacy DB, but MS SQL Server and Oracle should proceed similarly.

a) Migrating data

Using DBeaver's Database Navigator, search for the tables on the MySQL database. For example, let’s migrate the image table. Once the table is selected, click on it and then Export Data:

DBeaver's Database Navigator's Export Data command.

Use the Database method to export:

DBeaver Data Transfer: Database Table target.

On the Tables mapping section, select the Postgres database if is not select by default:

DBeaver Data Transfer: select Postgres as target. Selecting: Postgres, Databases, dotcms, schemas, public.

You can make sure that the columns mappings are right by clicking on the arrow on the left:

Click the left arrow to expand the mapping.

If the table does not exist in the target, by default DBeaver will suggest creating it. However, three circled tables are no longer needed, so set the mapping to skip:

abcontact, abcontacts_ablists, and ablist tables set to skip.

The list below includes all tables that can be safely skipped, because they are temporal or related to the live environment:

abcontactdot_cluster
abcontacts_ablistsdist_journal
ablistdist_process
analytic_summarydist_reindex_journal
analytic_summary_404indicies
analytic_summary_pagesnotification
analytic_summary_periodpublishing_bundle
analytic_summary_refererpublishing_end_point
analytic_summary_visitspublishing_environment
analytic_summary_workstreampublishing_pushed_assets
analytic_summary_contentpublishing_queue
clickpublishing_queue_audit
clickstreampublishing_bundle_environment
clickstream_404pollschoice
clickstream_requestpollsdisplay
cluster_server_uptimepollsquestion
cluster_server_actionpollsvote
cluster_serversystem_event

Now, modify the extraction settings. These are the configurations that will use DBeaver to pull the data from the MySQL database. For more information please see DBeaver's data migration documentation:

Extraction settings.

If necessary, change the data load settings. These are the configurations used by DBeaver to insert the data into the Postgres database. Again, more detail can be found in DBeaver's data migration documentation:

Data load settings.

Finally, confirm the operation to begin the transfer:

Confirm to finish.

These steps should be performed on all tables, minus the exceptions in the table above.

b) Finishing the Migration

After copying all the tables from MySQL to Postgres, the last step is to recreate the constraints on the Postgres database.

Download 21.06.16-constraint-create.sql.

Now run the following command with reference to it:

psql -U dotcmsdbsuer dotcms < 21.06.16-constraint-create.sql

c) After Migration

There is a known issue that affects QRTZ jobs after migration. The following operations have been shown to fix it:

DELETE FROM qrtz_locks;
INSERT INTO qrtz_locks values('TRIGGER_ACCESS');
INSERT INTO qrtz_locks values('JOB_ACCESS');
INSERT INTO qrtz_locks values('CALENDAR_ACCESS');
INSERT INTO qrtz_locks values('STATE_ACCESS');
INSERT INTO qrtz_locks values('MISFIRE_ACCESS');

DELETE from QRTZ_EXCL_locks;
INSERT INTO QRTZ_EXCL_locks values('TRIGGER_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('JOB_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('CALENDAR_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('STATE_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('MISFIRE_ACCESS');

On this page

×

We Dig Feedback

Selected excerpt:

×