We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Achieve a high-performance migration to Amazon RDS for Oracle from on-premises Oracle with Amazon Web Services DMS
Many customers deploy the
Multiple options are available to migrate Oracle databases to Amazon RDS for Oracle (for details, refer to
In our last blog post
Solution overview
The following figure depicts how Amazon Web Services DMS pulls data from the on-premises Oracle database and pushes it to Amazon RDS for Oracle.
In the following sections, we discuss strategies for migrating large tables and LOB columns, splitting tables into multiple tasks, and other important considerations.
Prerequisites
Before we use Amazon Web Services DMS to migrate the database, you must have the following:
- A source Oracle database running on an existing on-premises environment
- A target Oracle database running on Amazon RDS for Oracle
- An Amazon Web Services DMS replication instance with the
optimal instance type for your migration workload -
Network connectivity from the Amazon Web Services DMS replication instance to the source and target Oracle databases - A database user account with required privileges on the
source andtarget Oracle databases
Additionally, review the limitations before starting the migration for the
Amazon RDS for Oracle considerations during migration
In this section, we discuss some considerations during the migration before diving into the migration methods.
Disable the archive log
An
During the migration, the database on Amazon RDS for Oracle isn’t available for production, so you can temporarily disable the archived logs. This can help in dedicating storage I/O for data migration. Also, in the case of
noarchivelog
mode with Amazon RDS automated backups disabled makes more sense.
Disable Multi-AZ
Amazon RDS provides the option of deploying databases in
Increase the redo log group size
By default, Amazon RDS for Oracle has four online redo logs of 128 MB each. If this is too small, it will cause frequent log switches and additional overhead on the
This message means that redo logs are being switched so fast that the checkpoint associated with the log switch isn’t getting completed. This means that redo log groups are filled by the Oracle LGWR process and are now waiting for the first checkpoint to complete. Until the first checkpoint is complete, processing is stopped. Although this would be for less time, because of this happening multiple times, it will cause an overall degradation of the database migration. To address this, it’s recommended to
Make sure to enable archive logging, enable Multi-AZ, and set the redo log size so that log switches are happening with an interval of 15–30 minutes before moving to production.
Create the primary key and secondary indexes after full load
Disabling the primary key and secondary indexes can significantly improve the performance of the load. With Amazon Web Services DMS, you have an option to create the primary key after loading the data by changing the task setting CreatePkAfterFullLoad
.
Migrate large tables
By default, Amazon Web Services DMS uses a single thread to read data from the table to be migrated. This means Amazon Web Services DMS initiates a SELECT statement on the source table during full load data migration phase. It does a full table scan, which isn’t recommended on large tables, because it causes all blocks to be scanned sequentially and a huge amount of unnecessary I/O. This will eventually degrade the overall performance of the source database as well as migration speed.
Although this is acceptable for smaller tables, performing a full table scan on a large table could cause slowness on the entire migration effort and could also result in ORA-01555 errors.
Amazon Web Services DMS provides built-in options to load tables in
Using parallel load on an indexed column will significantly improve performance, because it forces the optimizer to use an index scan instead of a full table scan during unload. One of the important factors in choosing range segmentation is to identify the right number of partitions and also make sure the data is distributed evenly. For example, if you have a table with 2 billion rows and you’re using five ranges to load them in parallel, you’re still performing a full table scan on the table.
One way to identify if the SELECT query being sent by Amazon Web Services DMS to the source database isn’t doing a full table scan is by
The following query uses the ntile
function to identify the column with the exact number of rows in each partition. It also provides the max value, which can be used as a range for your table mapping to load it in parallel. You can use this even if your column is an UUID.
Make sure to run it in your standby or development instance before running on your primary.
The following screenshot shows an example of the query output. ID
is the column on which we are planning to create a range boundary for table VIQASH.DUMMY_TABLE
. The number of partitions specified is 16, which seems enough for a 10-million-row table.
In table mappings, this can be specified for a non-partitioned table in JSON. For more information, refer to
In our test environment (
The first graph shows a single-threaded method without the parallel load option.
The following graph shows throughput with 24 ranges.
Keep in mind that performance can vary widely based on different factors, such as the following:
- Good network bandwidth – For example, if you have a shared link (VPN) between the on-premises database to the Amazon Web Services Region, you might not see the same performance as our example, because you might be saturating the network bandwidth. This could impact the load, and you might not be able to use higher parallelism. On the other hand, if you use a dedicated link or
Amazon Web Services Direct Connect , you might see a better performance when loading a large dataset. - LOBs – With a greater number of LOB columns in your table, additional memory is required to load such tables, and provisioning a replication instance with lower memory will impact the performance of the load. Additional options for replicating LOB data efficiently is covered in the next section.
- Replication instance – Having more CPU and memory is ideal for parallelism. For guidance on choosing the best size, refer to
Choosing the best size for a replication instance .
Based on our test, it took us half the time to load the table when using parallel load with optimal configuration, and we avoided full table scans on the source database while unloading the data.
Migrate LOB columns
Migrating LOB columns is challenging for any database migration because of the large amount of data they hold. Amazon Web Services DMS provides three options to migrate LOB columns: limited, full, and inline mode.
Limited LOB mode
This is the preferred mode of any LOB column migration because it instructs Amazon Web Services DMS to treat LOB columns as VARCHAR data types,
Any data in the column greater than the max LOB size (KB) will be truncated, and the following warning is written in the Amazon Web Services DMS logs:
To identify the maximum size of LOB columns (in KB), you can use the following query:
Therefore, if you have LOB columns larger than 63 KB, consider full LOB mode or inline LOB mode, which may give better performance when loading your table.
Full LOB mode
This mode migrates the entire column irrespective of the LOB size. This is the slowest option for LOB migration because Amazon Web Services DMS has no information about the maximum size of LOBs to expect. It just uses LOB chunk size (KB), which by default is 64 KB, to distribute the LOB data in pieces. Then it sends all chunks to the target, piece by piece, until the entire LOB column is migrated.
Full LOB mode performs a two-step approach for loading the data: it inserts an empty_clob
or empty_blob
on the target followed by a lookup on the source to perform an update on the target. Therefore, you need a primary key on the source table to load LOBs using full LOB mode. If you don’t have a primary key, Amazon Web Services DMS prints the following message on the logs:
Inline LOB mode
This mode is a hybrid approach that combines the functionality of limited LOB mode and full LOB mode, and provides better performance when loading LOB objects. Based on your task configuration, Amazon Web Services DMS will choose to perform inline or LOB lookup based on the size of the data. Because it uses full LOB mode as well, it’s also mandatory to have a primary key or unique key on the source table. The maximum value allowed for inline LOB size is also 2 GB, which is the same as for limited LOB mode.
Understanding your LOB data
As stated earlier, migrating LOBs is tedious in general, so it’s important to understand the LOB distribution of your table before choosing the setting. If your distribution varies a lot across your table, it makes it more challenging to load the data. The following query could be useful for analyzing your LOB distribution and choosing the right setting for your task. This script creates a table for analyzing the LOB distribution of your table:
*Replace LOB-COLUMN-NAME
with actual LOB
column and LOB_TABLE
with actual table name.
Make sure to run it in your standby or development instance before running on your production.
After the table is created, you can use the following query to check LOB distribution in the source table:
The output is similar to the following screenshot (based on the number of rows you have in the source database table).
This query output highlights that there are six rows with a max LOB size of 5 bytes, which accounts for 85% of the entire table data (the first row in query output). There is only one row, which is 120,045,568 bytes (115 MB), which accounts for 14% of the entire data in the table.
Based on this information, it will be more efficient if we migrate six rows with small LOB size using inline LOB mode, and just one row using full LOB mode. To achieve this, we can modify the task settings as follows:
With this task setting, six rows that are 5 bytes each are migrated using inline LOB mode. This is because parameter InlineLobMaxSize
is set to 32 KB. It means any rows with LOB size less than 32 KB will use the limited LOB mode method to load the data. The row with size 115 MB will be migrated using full LOB mode with a two-step process.
Split tables into multiple Amazon Web Services DMS tasks
For large databases, you must always understand and plan the migration first. The most important part of this is deciding how many tasks need to be created in total. One important reason for creating multiple tasks is to better utilize Amazon Web Services DMS replication instance resources (CPU, memory, and disk I/O). For example, a single task migrating 100 tables won’t be able to utilize all resources, compared to 10 tasks with each task migrating 10 tables.
Replication instance type plays a vital role in deciding how many Amazon Web Services DMS tasks run on it in parallel. It is entirely dependent on the resources available on the replication instance. This needs to be monitored by looking at the
By default, Amazon Web Services DMS migrates eight tables in parallel per task, which is based on the parameter
In a test environment ( MaxFullLoadSubTasks
set to the default of eight) where we had a table with 58 GB and 4.5 TB of LOBs with a maximum size of 80 MB LOB columns, the Amazon Web Services DMS estimate was around 5 days to migrate all the data. This estimate was calculated by identifying the number of rows transferred in a few hours. With the value set to 49 for MaxFullLoadSubTasks
, Amazon Web Services DMS was able to migrate all the data in 20 hours.
Other important Amazon Web Services DMS parameters to consider
When using parallel load on the target with a direct path load, it’s important to disable the constraints in the target and also add these direct path settings on the target endpoint to avoid any
directPathNoLog
won’t generate redo logs while unloading the data.
For more information about directPathNoLog
and directPathParallelLoad
, refer to
Summary
In this post, we discussed how to perform migration between an on-premises Oracle database to Amazon RDS for Oracle efficiently using Amazon Web Services DMS. We also looked into various Amazon Web Services DMS settings and used SQL queries to identify the ideal configuration, which will help when migrating your database.
Try these settings in your next Oracle database migration to Amazon RDS for Oracle. If you have any comments or questions, leave them in the comments section.
About the Authors
Viqash Adwani is a Sr. Database Specialty Architect with Amazon Web Services. He works with internal and external Amazon customers to build secure, scalable, and resilient architectures in the Amazon Web Services Cloud and help customers perform migrations from on-premises databases to Amazon RDS and Amazon Aurora databases.
Aswin Sankarapillai is a Sr. Database Engineer in Database Migration Service at Amazon Web Services. He works with our customers to provide guidance and technical assistance on database migration projects, helping them improve the value of their solutions when using Amazon Web Services.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.