Recently I decided just to test the behavior of changing the parameter PARALLELISM (1-40) using RMAN backups under different storage types in Amazon AWS and check the results.
Just a few things before start just to refresh some concepts regarding RMAN.
- Usually backup bottlenecks are not around CPUs. Generally speaking it’s the ability to write into disk or a backup device solution;
- A channel represents one stream of data to a device (disk or sbt);
- In Linux: server session corresponds to a server process;
- In Windows: server session corresponds to a thread;
- Backup or restore operation: from input device: reads / process data and from output device: writes data;
- Database Backup and Recovery User’s Guide: “The number of channels available for a device type when you run a command determines whether RMAN reads or writes in parallel. As a rule, the number of channels used in executing a command should match the number of devices accessed.“
That said let’s start. I did few RMAN backups using different PARALLELISM numbers. In this test I used 1, 2, 5, 10, 20, 30 and 40 of PARALLELISM. Than I’ve extracted these results in a chart and checked the behavior.
This is the setup that I used in Amazon AWS.
RH Enterprise 7.3 | Oracle 12.2 | Storage 1: GP2 flexible 150-3000 IOPS | Storage 2: IO1 20k IOPS | 8vCPU | 32 RAM
First chart: RMAN backups running under GP2 (150-3000 IOPS) storage type and using RMAN default BASIC compression.
After few executions we have these results:
- There is a huge gap time between P1 and P5;
- From P5 afterwards the gap time got established and diversifying in few seconds only;
- After P20 RMAN seems to start to “wasting” time allocating channels to execute the backup itself or the “ability to write into disk” is I would say got exhausted. We can see this because in P30 backup time start to increase. During these tests I followed up the CPUs and they are quite enough.
- The limit benefit for PARALLELISM in this case ends in P20 but depend of your strategy maybe P5 might be a better option because the RMAN backup delay few seconds more but will allocate only P5 instead of P20. Of course this is a test scenario backing 20GB of data only. So for TBytes, PBytes database it will be a completed different scenario and I don’t have enough money to allocate this amount of storage and neither time to populate this amount of data so I hope that you understand this J.
Second chart: RMAN backups running under IO1 (20k IOPS) storage type and using RMAN default BASIC compression.
After few executions we have these results:
- There is a huge gap time between P1 and P5 (similar behavior as chart one);
- From P5 afterwards the gap time got established and diversifying in few seconds only (similar behavior as chart one);
- Instead of P20 the “limit benefit” was pushed it back to P10. Maybe a slight difference because the use of IO1 storage type;
- The P5 might be the better cost vs benefits to setup the RMAN parallelism for this environment.
This was my basic test regarding the RMAN PARALLELISM in Amazon AWS. If it is possible to you test things in your environment before going to production just do it. You might have a better understanding of your RMAN environment to choose the suitable degree of parallelism.
Hope it helps.
Regards,
Leonardo Bissoli.