10G Data Pump Part 1 - Features and Benefits
Now that I've used Data Pump, I found out that the improvements go far beyond mere performance enhancements. In part 1 of this multi-part series on the Data Pump toolsets, I'll cover some of the new features and functionalities the utility provides. In upcoming blogs, we'll take a look at the data pump architecture and compare the toolset to its ancestors Export and Import.
I thought I would start the series off by providing you with a laundry list of benefits the new utility provides. This is only a partial set of features. If you would like to chime in with a few of your own or you have some helpful hints (or warnings) to provide, please feel free to respond with a comment.
The intent of this blog is to generate interest in the Data Pump utilities. In my next blog, I'll continue our discussion on Data Pump by providing you with a general overview of its architecture and finish the series by showing you how to use Data Pump. I'll provide a few examples and screen shots to enhance the learning process along the way.
Before we begin let's look at a very quick example of a Data Pump Export command line:
expdp foot/foot directory=export_dir dumpfile=expfull.dmp logfile=expfull.log full=y
"expdp" tells the operating system to run the Data Pump Export utility. The parameters that follow are used to control the job's execution. In our example, the account and password are both "foot", the file that will contain the output data is "expfull.dmp", the logfile is"expfull.log" , and a full database export will be performed.
Most of the parameters are simple, but a further explanation of the DIRECTORY parameter is in order. Because a Data Pump job is executed in the database, you must create a directory using a SQL statement on the database being accessed. If you don't pre-create the directory, Oracle provides a default directory called DATA_PUMP_DIR. That means that users are unable to fully qualify the output file and log file as they were able to do in the non Data Pump versions of Export and Import. You specify a fully qualified directory on the server when you create the new database directory object and then you must grant access to that directory. I'll show you examples of this in next week's blog.
The Oracle manual provides an entire listing of parameters that can be used as input to the Data Pump Export and Import utilities. I can choose to export a single table, an entire schema, parallelize the job's execution to name just a few of the options available. I can exclude objects, include only a set of objects and transfer data across the network directly from one database to another. Let's get started by looking at just a few of the benefits Data Pump provides.
Performance
Benefits
Oracle documentation states that Data Pump's performance on data retrieval is
60% faster than Export and 15 to 20 times faster on data input than Import.
I have run some simple tests to roughly compare the performance of the older
versions of Export and Import to their Data Pump counterparts (also called Export
and Import) and have found the claims to be pretty close. But like all claims
"your performance may vary - as mine did." I will say that Data Pump
is very fast compared to its ancestors. To keep the discussion from getting
confusing, I'll refer to the 10G Versions of Export and Import as Data Pump
Export and Data Pump Import.
Oracle documentation states that the following factors contribute to the performance improvements:
-
Parallelism
- Multiple worker processes are able to perform inter-table and inter-partition
work in parallel. This allows administrators to load and unload database tables
using parallel processing. Data Pump Import takes advantage of parallelism
to build indexes and load package bodies.
The most beneficial aspect of Data Pump's parallel processing feature is that it is dynamic. Administrators are able to attach (connect) to a Data Pump job that is currently executing and dynamically alter the number of parallel processes it uses. I'll show you how to dynamically activate parallelism in the demo at the end of this blog.
- DBAs that ran
Export and Import remotely (not logged on to the target server) were required
to transfer data to their local machines. Data pump Export reads and writes
files directly to the server and does not require any data movement over the
network to the client machine. Data Pump jobs are scheduled and processed
within the database engine as opposed to being a separate client-side process.
- The format of
the Data Pump Export output files is very similar to the format of the data
stored in the Oracle tablespace datafiles. Sounds pretty logical to me. If
you want to transfer data from a table to a flat file and back into another
table, it makes sense to keep the format of the output datafile as close to
the format of a database datafile as possible.
Attaching and
Detaching
Because Data Pump runs in the database, you can detach the client from a Data
Pump job and reattach it at any time to monitor and/or change execution parameters.
The reattach can be performed as often as required during the job's lifecycle.
If there are numerous jobs running, you attach to a specific job by specifying
the job's name in the ATTACH command.
This feature allows administrators to start a Data Pump job, detach from it and attach to it from time to time to monitor its execution. Detaching from an active Data Pump job will be a little strange at first because you have to hit the CONTROL- C combination that we all use to try and break out of something that is stuck. First time I did it, I thought "I'll break out of this and the job will stop." Wrong, as you will see in the demo, you can attach and detach as often as you would like.
Job Status Information
Oracle provides the STATUS command to allow administrators to view the status
of the job they are attached to. It provides virtually all of the information
needed to monitor the job's execution. In addition you can use the STATUS="seconds"
command to obtain a status at timed intervals.
You can also use DBA_DATAPUMP_JOBS to view general information on all active Data Pump jobs and DBA_DATAPUMP_SESSIONS to view information on user sessions that are attached to active Data Pump jobs. In addition, administrators are able to use the tried-and-true V$SESSION_LONGOPS to review how much work the Data Pump job has completed and how much more work it needs to perform.
Space Estimation
Administrators are able to estimate the size of a Data Pump's Export output
file by using the ESTIMATE and ESTIMATE_ONLY parameters. I have heard comments
from some administrators that the estimates can be time consuming. Although,
I haven't performed this command on very large sets of data, it seemed to work
pretty well for me. The ESTIMATE parameter has two options:
- ESTIMATE=STATISTICS
- Estimates the size of the output file by using the data dictionary statistics
associated with the objects being exported. The statistics will need to be
current for this option to be accurate.
- ESTIMATE=BLOCKS - Estimates the size of the output file by multiplying the number of blocks each object being exported consumes times the database block size.
You would place the parameter above on the command line. But if you only want to estimate the size of the output file, you'll need to remember to use the "ESTIMATE_ONLY=y" parameter in conjunction with the ESTIMATE parameter. If you don't you will estimate the size of the output file and perform the export.
Interruptible
and Restartable
Data Pump jobs update a master table with all pertinent information pertaining
to the job being executed. The master table knows about the objects being exported
or imported and their location in the dump file set. As a result, a Data Pump
job can be restarted in the event of a planned or unplanned stoppage. The Data
Pump job accesses the master table to determine where to start and picks up
right where it left off. Administrators are able to use the STOP_JOB AND START_JOB
commands on the client to temporarily stop jobs and restart them when desired.
Database to
Database Transfers
Data Pump Import allows users to transfer data directly from a source database
to a target database without using disk as a temporary staging (and stopping)
point. You use the NETWORK_LINK command to specify a pre-created database link
that points to the source database. The server connects to the source database
specified in the database link, retrieves the data and writes it directly to
the target database. No dump files required!
Recreates Schema
Users and Grants in Schema Mode
As stated previously, users have the option of choosing to export an entire
schema. Let's say you wanted to refresh an entire schema in a target database
from an export file you have available. The easy way to create all of the tables
from scratch is to drop the user with the cascade option, recreate the user,
grant the user the privileges they used to have and use Import utility to recreate
the schema objects. The problem with the older version of Import was that because
it didn't recreate the user (just the user's objects), the administrator was
forced to save creation and grant statements to recreate the user with the same
privileges it had before the drop was performed. A Data Pump schema import will
recreate the user and execute all of the associated grants, user password history,
etc..
Fine-Grained
Selectivity
Being able to narrow the scope of the work performed is critical to most of
the administrative functions we perform. The tighter we can focus on the set
of objects we are administering, the more efficient and safer our administrative
activities will be. Data Pump provides several parameters that help administrators
narrow the scope of objects being processed:
- Exclude - Notifies
Data Pump to exclude the specified object type from the export or import operation.
Multiple excludes can be used together to exclude more than one object type
from the operation. For example, "EXCLUDE=function" would exclude
all of the functions from a Data Pump operation. SQL-like syntax can be used
to provide even greater selectivity.
- INCLUDE - Notifies
Data Pump to include only those object types specified. All other object types
will not be included in the export or import operation. Like its EXCLUDE counterpart,
a SQL-like syntax can be used to provide greater selectivity.
- CONTENT - Notifies
data pump to include only the metadata (CONTENT=METADATA_ONLY), only the data
(CONTENT=DATA_ONLY) or both metadata and data (CONTENT=ALL).
- QUERY - Provides
users with the ability to use SQL statements to limit the rows processed by
the Data Pump operation. A limited version of this feature was available in
the older releases of Export and Import. The QUERY feature has been improved
by allowing the statement to be qualified with a table name to apply the statement
to a specific table. In addition the QUERY option can now be used in both
Data Pump Export and Import operations.
- FLASHBACK -
Allows Data Pump Export and Import to restrict data to a specific time in
the past. Being able to go "back in time" and retrieve data seems
to be an extremely valuable Data Pump feature to me. One example would be
to view before images of data that was changed incorrectly. If an application
program runs amok and changes data erroneously, the Data Pump utility could
be used to go back in time and populate data in a different schema to allow
the development team to identify the differences at their leisure.
Transformations
Older versions of Export and Import did not provide users with the ability to
change an object's storage characteristics. For example, there wasn't a real
simple way to export a set of object from one tablespace and import them into
another tablespace. There were a few tricks that could be performed to make
this happen but they certainly weren't as easy or as straightforward as Data
Pump's transformation features.
Data Pump allows users to easily change the target tablespace that the objects
will be created in (REMAP_TABLESPACE), migrate objects from one schema to another
(REMPAP_SCHEMA) and move databases across platforms that have different file
system symantics (REMAP_DATAFILE).
The TRANSFORM parameter can also be used to alter object creation DDL for specific
objects. For example, this synax:
TRANSFORM=SEGMENT_ATTRIBUTES:n:table
Could be used to change this object creation statement:
CREATE TABLE foot.dept
(emp_id NUMBER(9),
first_name VARCHAR2(40),
last_name VARCHAR2(50))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 10240 NEXT 10240 MINEXTENTS 1 MAXEXTENTS 500
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE hr_data1 ;
To this one when running a Data Pump Import operation:
CREATE TABLE foot.dept
( emp_id NUMBER(9),
first_name VARCHAR2(40),
last_name VARCHAR2(50));
The transformation
feature provides numerous variations to allow administrators to tailor each
object's storage characteristics to meet their specific needs.
Generating DDL Files
Data Pump's Import utility can be used to create an output file containing DDL
that is ready for execution. The DDL statements are not executed during the
Data Pump operation so the target system will not be changed.
Target Table Exists Operations
DBAs have been using Export and Import to populate existing tables with data
since the Export and Import utilities have been available. I was using Oracle
when the original Export and Import utilities first became available. I used
to be proud of this, now it makes me feel like I'm ready to retire. I have used
Export and Import dozens of times to load existing data tables. These new existing
object features are certainly helpful.
Oracle has improved upon this functionality by providing administrators with
four different options when loading preexisting tables:
- TABLE_EXISTS_ACTION=SKIP
- Existing tables are bypassed if they already exist. No action is taken.
- TABLE_EXISTS_ACTION=APPEND
-Data Pump Import will append rows to the existing table.
- TABLE_EXISTS_ACTION=TRUNCATE
- Data Pump Import will truncate the existing table and load rows from the
source.
- TABLE_EXISTS_ACTION=REPLACE
- Existing tables will be dropped, recreated and then loaded from the source.
Demo!
Let's take a quick look at the attached
file that shows a few of the new features that Data Pump provides. I've
added some comments along the way.
Wrapup
As I stated previously, the intent of this blog was not to be a primer on Data
Pump administration and execution. Rather, it was intended to provide you with
a laundry list of interesting and beneficial features to pique your curiosity.
Hopefully this curiosity will spur your to learn more about these utilities
(and read my upcoming blogs!).