10G Data Pump Part 1 - Features and Benefits

Describing the Oracle Export and Import utilities as slow is like saying the Titanic sprung a small leak. Stories of Export and Import executions running for hours (and sometime days) are commonplace. Oracle 10G provides a new data movement utility, called Data Pump, to increase the performance of data transfers.

Now that I've used Data Pump, I found out that the improvements go far beyond mere performance enhancements. In part 1 of this multi-part series on the Data Pump toolsets, I'll cover some of the new features and functionalities the utility provides. In upcoming blogs, we'll take a look at the data pump architecture and compare the toolset to its ancestors Export and Import.

I thought I would start the series off by providing you with a laundry list of benefits the new utility provides. This is only a partial set of features. If you would like to chime in with a few of your own or you have some helpful hints (or warnings) to provide, please feel free to respond with a comment.

The intent of this blog is to generate interest in the Data Pump utilities. In my next blog, I'll continue our discussion on Data Pump by providing you with a general overview of its architecture and finish the series by showing you how to use Data Pump. I'll provide a few examples and screen shots to enhance the learning process along the way.

Before we begin let's look at a very quick example of a Data Pump Export command line:

expdp foot/foot directory=export_dir dumpfile=expfull.dmp logfile=expfull.log full=y

"expdp" tells the operating system to run the Data Pump Export utility. The parameters that follow are used to control the job's execution. In our example, the account and password are both "foot", the file that will contain the output data is "expfull.dmp", the logfile is"expfull.log" , and a full database export will be performed.

Most of the parameters are simple, but a further explanation of the DIRECTORY parameter is in order. Because a Data Pump job is executed in the database, you must create a directory using a SQL statement on the database being accessed. If you don't pre-create the directory, Oracle provides a default directory called DATA_PUMP_DIR. That means that users are unable to fully qualify the output file and log file as they were able to do in the non Data Pump versions of Export and Import. You specify a fully qualified directory on the server when you create the new database directory object and then you must grant access to that directory. I'll show you examples of this in next week's blog.

The Oracle manual provides an entire listing of parameters that can be used as input to the Data Pump Export and Import utilities. I can choose to export a single table, an entire schema, parallelize the job's execution to name just a few of the options available. I can exclude objects, include only a set of objects and transfer data across the network directly from one database to another. Let's get started by looking at just a few of the benefits Data Pump provides.

Performance Benefits
Oracle documentation states that Data Pump's performance on data retrieval is 60% faster than Export and 15 to 20 times faster on data input than Import. I have run some simple tests to roughly compare the performance of the older versions of Export and Import to their Data Pump counterparts (also called Export and Import) and have found the claims to be pretty close. But like all claims "your performance may vary - as mine did." I will say that Data Pump is very fast compared to its ancestors. To keep the discussion from getting confusing, I'll refer to the 10G Versions of Export and Import as Data Pump Export and Data Pump Import.

Oracle documentation states that the following factors contribute to the performance improvements:

Parallelism - Multiple worker processes are able to perform inter-table and inter-partition work in parallel. This allows administrators to load and unload database tables using parallel processing. Data Pump Import takes advantage of parallelism to build indexes and load package bodies.

The most beneficial aspect of Data Pump's parallel processing feature is that it is dynamic. Administrators are able to attach (connect) to a Data Pump job that is currently executing and dynamically alter the number of parallel processes it uses. I'll show you how to dynamically activate parallelism in the demo at the end of this blog.
DBAs that ran Export and Import remotely (not logged on to the target server) were required to transfer data to their local machines. Data pump Export reads and writes files directly to the server and does not require any data movement over the network to the client machine. Data Pump jobs are scheduled and processed within the database engine as opposed to being a separate client-side process.
The format of the Data Pump Export output files is very similar to the format of the data stored in the Oracle tablespace datafiles. Sounds pretty logical to me. If you want to transfer data from a table to a flat file and back into another table, it makes sense to keep the format of the output datafile as close to the format of a database datafile as possible.

Attaching and Detaching
Because Data Pump runs in the database, you can detach the client from a Data Pump job and reattach it at any time to monitor and/or change execution parameters. The reattach can be performed as often as required during the job's lifecycle. If there are numerous jobs running, you attach to a specific job by specifying the job's name in the ATTACH command.

This feature allows administrators to start a Data Pump job, detach from it and attach to it from time to time to monitor its execution. Detaching from an active Data Pump job will be a little strange at first because you have to hit the CONTROL- C combination that we all use to try and break out of something that is stuck. First time I did it, I thought "I'll break out of this and the job will stop." Wrong, as you will see in the demo, you can attach and detach as often as you would like.

Job Status Information
Oracle provides the STATUS command to allow administrators to view the status of the job they are attached to. It provides virtually all of the information needed to monitor the job's execution. In addition you can use the STATUS="seconds" command to obtain a status at timed intervals.

You can also use DBA_DATAPUMP_JOBS to view general information on all active Data Pump jobs and DBA_DATAPUMP_SESSIONS to view information on user sessions that are attached to active Data Pump jobs. In addition, administrators are able to use the tried-and-true V$SESSION_LONGOPS to review how much work the Data Pump job has completed and how much more work it needs to perform.

Space Estimation
Administrators are able to estimate the size of a Data Pump's Export output file by using the ESTIMATE and ESTIMATE_ONLY parameters. I have heard comments from some administrators that the estimates can be time consuming. Although, I haven't performed this command on very large sets of data, it seemed to work pretty well for me. The ESTIMATE parameter has two options:

ESTIMATE=STATISTICS - Estimates the size of the output file by using the data dictionary statistics associated with the objects being exported. The statistics will need to be current for this option to be accurate.
ESTIMATE=BLOCKS - Estimates the size of the output file by multiplying the number of blocks each object being exported consumes times the database block size.

You would place the parameter above on the command line. But if you only want to estimate the size of the output file, you'll need to remember to use the "ESTIMATE_ONLY=y" parameter in conjunction with the ESTIMATE parameter. If you don't you will estimate the size of the output file and perform the export.

Interruptible and Restartable
Data Pump jobs update a master table with all pertinent information pertaining to the job being executed. The master table knows about the objects being exported or imported and their location in the dump file set. As a result, a Data Pump job can be restarted in the event of a planned or unplanned stoppage. The Data Pump job accesses the master table to determine where to start and picks up right where it left off. Administrators are able to use the STOP_JOB AND START_JOB commands on the client to temporarily stop jobs and restart them when desired.

Database to Database Transfers
Data Pump Import allows users to transfer data directly from a source database to a target database without using disk as a temporary staging (and stopping) point. You use the NETWORK_LINK command to specify a pre-created database link that points to the source database. The server connects to the source database specified in the database link, retrieves the data and writes it directly to the target database. No dump files required!

Recreates Schema Users and Grants in Schema Mode
As stated previously, users have the option of choosing to export an entire schema. Let's say you wanted to refresh an entire schema in a target database from an export file you have available. The easy way to create all of the tables from scratch is to drop the user with the cascade option, recreate the user, grant the user the privileges they used to have and use Import utility to recreate the schema objects. The problem with the older version of Import was that because it didn't recreate the user (just the user's objects), the administrator was forced to save creation and grant statements to recreate the user with the same privileges it had before the drop was performed. A Data Pump schema import will recreate the user and execute all of the associated grants, user password history, etc..

Fine-Grained Selectivity
Being able to narrow the scope of the work performed is critical to most of the administrative functions we perform. The tighter we can focus on the set of objects we are administering, the more efficient and safer our administrative activities will be. Data Pump provides several parameters that help administrators narrow the scope of objects being processed:

Exclude - Notifies Data Pump to exclude the specified object type from the export or import operation. Multiple excludes can be used together to exclude more than one object type from the operation. For example, "EXCLUDE=function" would exclude all of the functions from a Data Pump operation. SQL-like syntax can be used to provide even greater selectivity.
INCLUDE - Notifies Data Pump to include only those object types specified. All other object types will not be included in the export or import operation. Like its EXCLUDE counterpart, a SQL-like syntax can be used to provide greater selectivity.
CONTENT - Notifies data pump to include only the metadata (CONTENT=METADATA_ONLY), only the data (CONTENT=DATA_ONLY) or both metadata and data (CONTENT=ALL).
QUERY - Provides users with the ability to use SQL statements to limit the rows processed by the Data Pump operation. A limited version of this feature was available in the older releases of Export and Import. The QUERY feature has been improved by allowing the statement to be qualified with a table name to apply the statement to a specific table. In addition the QUERY option can now be used in both Data Pump Export and Import operations.
FLASHBACK - Allows Data Pump Export and Import to restrict data to a specific time in the past. Being able to go "back in time" and retrieve data seems to be an extremely valuable Data Pump feature to me. One example would be to view before images of data that was changed incorrectly. If an application program runs amok and changes data erroneously, the Data Pump utility could be used to go back in time and populate data in a different schema to allow the development team to identify the differences at their leisure.

Transformations
Older versions of Export and Import did not provide users with the ability to change an object's storage characteristics. For example, there wasn't a real simple way to export a set of object from one tablespace and import them into another tablespace. There were a few tricks that could be performed to make this happen but they certainly weren't as easy or as straightforward as Data Pump's transformation features.

Data Pump allows users to easily change the target tablespace that the objects will be created in (REMAP_TABLESPACE), migrate objects from one schema to another (REMPAP_SCHEMA) and move databases across platforms that have different file system symantics (REMAP_DATAFILE).

The TRANSFORM parameter can also be used to alter object creation DDL for specific objects. For example, this synax:

TRANSFORM=SEGMENT_ATTRIBUTES:n:table

Could be used to change this object creation statement:

CREATE TABLE foot.dept
(emp_id NUMBER(9),
first_name VARCHAR2(40),
last_name VARCHAR2(50))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 10240 NEXT 10240 MINEXTENTS 1 MAXEXTENTS 500
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE hr_data1 ;

To this one when running a Data Pump Import operation:

CREATE TABLE foot.dept
( emp_id NUMBER(9),
first_name VARCHAR2(40),
last_name VARCHAR2(50));

The transformation feature provides numerous variations to allow administrators to tailor each object's storage characteristics to meet their specific needs.

Generating DDL Files
Data Pump's Import utility can be used to create an output file containing DDL that is ready for execution. The DDL statements are not executed during the Data Pump operation so the target system will not be changed.

Target Table Exists Operations
DBAs have been using Export and Import to populate existing tables with data since the Export and Import utilities have been available. I was using Oracle when the original Export and Import utilities first became available. I used to be proud of this, now it makes me feel like I'm ready to retire. I have used Export and Import dozens of times to load existing data tables. These new existing object features are certainly helpful.

Oracle has improved upon this functionality by providing administrators with four different options when loading preexisting tables:

TABLE_EXISTS_ACTION=SKIP - Existing tables are bypassed if they already exist. No action is taken.
TABLE_EXISTS_ACTION=APPEND -Data Pump Import will append rows to the existing table.
TABLE_EXISTS_ACTION=TRUNCATE - Data Pump Import will truncate the existing table and load rows from the source.
TABLE_EXISTS_ACTION=REPLACE - Existing tables will be dropped, recreated and then loaded from the source.

Demo!
Let's take a quick look at the attached file that shows a few of the new features that Data Pump provides. I've added some comments along the way.

Wrapup
As I stated previously, the intent of this blog was not to be a primer on Data Pump administration and execution. Rather, it was intended to provide you with a laundry list of interesting and beneficial features to pique your curiosity. Hopefully this curiosity will spur your to learn more about these utilities (and read my upcoming blogs!).

Monday, May 22, 2006 | Permalink | Comments (0)

trackback URL: http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2006-05-19.6918984141/sbtrackback

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

10G Data Pump Part 1 - Features and Benefits