AWS DMS (Database Migration Service) supports ongoing replication to keep your source and target databases synchronized. However there are some pitfalls you should be aware of before you use it in production.
This post lists some of the issues you may run into with an ongoing DMS task that keeps a target database in sync with a source RDS SQL Server database.
You might notice corrupted rows in your target database - where some rows have values from other rows. This happens when DMS tries to replicate
An easy way to identify this is to set up a separate full load DMS task to sync the same data to a set of reference target tables. After this task completes, the original target tables (which are replicating). If everything is fine, these two sets should match.
If you notice any discrepancy, a short term fix is to do a periodic reload of the target tables. However, not all data will available during the reload. Also, this is not a feasible solution if your target table has foreign key references.
If you really need ongoing replication, you need to convert the
nvarchar(max) columns on the source to fixed length columns. This will require recreating constraints, indexes and statistics which use those columns.
3.4.3 - Minor Release, Major Bug
DMS 3.4.3 has quite a few bug fixes and new features. You might be tempted to upgrade because its a minor release from 3.4.2. Or you might have been automatically upgraded because you had Minor version automatic upgrade set to Yes on the replication instance.
If you have backups enabled for your SQL Server RDS, your DMS tasks will start failing with the message Last Error AlwaysOn BACKUP-ed data is not available (credits to https://dba.stackexchange.com/a/286367 for identifying the root cause).
It is not simple to downgrade since DMS does not allow you to go back to a lower engine version. You have to set up a new replication instance with the lower version. Then recreate all your tasks on it - you can’t move a task to a replication instance with a lower version.
Unrestricted Transaction Log Growth
DMS uses a dummy transaction to keep transactions on the transaction log (this speeds up replication since reading from the backup is slower). This dummy transaction is committed periodically to prevent the transaction log from growing. However if DMS loses track of this transaction (which it sometimes does), the transaction logs grow till all storage space is used up.
Contacting AWS Support is probably the best way to sort out this issue - in short, they will help you note which tables have CDC enabled, disable CDC, truncate the log and set up CDC again.
It is easy to run into this sort of issue if you sync a large volume of data every day. Unfortunately the only permanent fix is to move that part out of DMS - by using a bespoke solution.
Failing Tasks and Table Errors
Even if replication instances are reasonable provisioned and the volume of updates is normal, DMS can still error out once in a while.
One way to mitigate the impact of such errors is to set up a monitoring lambda that checks the task and table level status. You can then restart the task or reload table data - either manually or via another lambda. Note that target data will be fully or partly missing while the reload completes.
DMS Works, But…
DMS is a useful service for data migration. However it falls short of being a setup-and-forget solution for ongoing replication.
If you want to use DMS in production, make sure you monitor the source and target periodically to make sure everything is working as expected.