The Non-Technical Art of Being a Successful DBA – Database Recovery Best Practices
This blog will focus on the most important responsibility we are charged with as DBAs - ensuring that our organization’s databases can be quickly and easily recovered.
Introduction
Let me begin by stressing how strongly I feel about this topic. As I stated
in my first blog of this series, I always started my Oracle backup and recovery
class with this statement "The fastest way to lose your job in this profession
is to lose data for your company. You can be a Tom Kyte and a Jonathan Lewis
X 2, but if you can't recover a database, you aren't of any use to your employer."
That always seemed to ensure that my students paid attention during the remainder of the class. The backup and recovery class was one intense week of instruction that consisted of me pounding information and best practices into my students' collective heads. I would often stop after an important topic and bellow "DO YOU UNDERSTAND?!?!!" as loud as I could. By the end of the week, the class would immediately yell back "YES!!!" as loud as they could.
What can I tell you, that style worked for me. Trust me when I say that when my students left that class, they could all backup and recover an Oracle database. The backup and recovery classes were the ones responsible for my courses being labeled as "Foot Camp" by the Oracle student population. That was OK by me.
My first "mentor", if you could call him that, was an ex-Marine Corps drill instructor that went into IT after he retired. When I started my career, he was the first senior DBA I worked for. Every so often, he would walk up to the back of my chair as I was facing my terminal, lean in real close to my ear and say "You know what Foot, the next time I see you make a mistake, I'm not even going to tell you. I'm just gonna wait 5 minutes then come back and kick your *&^%$ up around your shoulder blades." Not motivational for sure, but I made few mistakes. Maybe some of that rubbed off...
Oracle Backup
and Recovery
Recovering
an Oracle database is a wonderfully complex task. Data files, log files, control
files, full backups, hot backups, RMAN and point-in-time recoveries all combine
to make many administrators lie awake nights thinking about whether their databases
can be easily recovered (or not).
The next few sections will provide some useful information on the Oracle backup and recovery process. My intent is to not cover any technical topics in-depth. You can get that information from a myriad of sources. My focus will be on the non-technical tips and tricks that will help you improve your recovery skills and ensure trouble free recoveries.
It's the Little
Things That Bite You
Most botched recoveries can be attributed to human error. Make sure all tapes
have proper retention periods, verify that all backups are executing correctly
and run test recoveries on a regular basis. Don't let missing tapes or backups
cause you to lose data. You don't want to hear UNIX support say "the retention
on that tape was supposed to be how long?" in the middle of a recovery.
COMMUNICATE with others that are responsible for all other pieces of the recovery
"pie" (system admins, operators) on a regular basis to ensure you
have everything you need to recover a crashed database. Pick a database, identify
the backup output files and verify that they are available when you need them.
Remember, YOU are the technician that is ultimately responsible for ensuring
that your organization's databases can be recovered. Not O/S support, operations,
application developers....
Document Your
Recovery Environment!
OK, I'm yelling already. I already covered the importance of good documentation
in a previous blog. You know by now that I work for a remote database services
provider. It is absolutely imperative for us to know EVERYTHING about our customer's
existing backup and recovery strategies. Part of our assimilation process is
to document our customer's environment. Here's a quick
list of some of the questions we ask. The document we actually use is a
standardized Word template that uses drop downs, text boxes and help buttons
but this shortened text document should provide you with a starting point to
help you build your own backup and recovery documentation library.
Keep Your Skills
Sharp
Don't let your recovery skills get rusty. The more test recoveries you do the
easier the production recoveries become. Create one database that you and your
fellow administrators can trash on a regular basis. Take turns and make a game
out of it. DBAs can be pretty creative when causing problems for others when
it's all in fun.
It can actually become quite an interesting game competing for bragging rights over who has the current title of "the most devious database destroyer." During one of my test recoveries, I couldn't even bring up the monitor. I waltzed down to the server room and saw an open drive bay on our test server with a bundle of unconnected wires sticking out. There was a single note attached below the opening telling me to look for the next note. 15 notes later and I found the drive. Dumped it in, fired it up and found that the database was deleted. I recovered it from a tape backup. THAT was devious.
If you are a senior-level DBA, make sure you keep the junior folks on their toes. I have never personally seen the database make a mistake during the recovery process. That leaves incomplete backups and DBA error as the most likely causes of "good recoveries gone bad."
At RemoteDBAExperts, we have dozens of customers that we have to support. We test recoveries and failovers on a regular basis. Not four hours ago, we had three of our folks perform a cold database backup and recovery on a Linux platform. Ensuring our recovery skills are sharp is that important to us. I still do test recoveries. Its important to me to ensure that I am ready to go when the time comes. If it were up to me, I would have our receptionist test her recovery skills too.
RELAX and Plan
Your Attack
When you are notified of a database failure, take a deep breath and relax. Don't
immediately begin to paste the database back together without a plan. Create
a recovery plan, put it on paper, have others review it if you can, and then
execute it. You shouldn't be trying to determine what the next step is in the
middle of the recovery process. I will plan my attack on paper for all recoveries,
no matter how simple they are.
Don't Be Afraid
to Ask Others
I have over 20 years of experience using Oracle and have done my fair share
of database backups and recoveries. During my career as an Oracle instructor,
I have assisted in hundreds (and hundreds) of database recoveries in Oracle's
classroom environments. If possible, I still have others review my recovery
strategy and recovery steps before I begin the recovery process. A second opinion
may prevent you from making a mistake or overlooking a key part of the recovery
process.
Don't be afraid to ask others and don't be afraid of calling Oracle support if you have to. That's what they get paid by your company to do - support you. Don't make a database unrecoverable by "guessing." When I first took over as the Database Group Manager for a large financial organization many years ago, I viewed the execution of over 70 different commands in an alert log after a botched recovery performed by a junior DBA. An ego that was too big to allow that person to ask questions created a database that was almost unrecoverable.
The Importance
of Formal Education
Read the Oracle Backup and Recovery Guides before reading third-party books.
The manuals will provide you with a firm foundation of knowledge on backup and
recovery strategies and procedures. Then move on to third-party books (like
this one) for helpful hints and tips that may assist you in the recovery process.
Take the Oracle classes! Oracle's instructors understand the importance of backups and recoveries. You will receive days of instruction and hours of hands-on labs. You'll learn everything from simple O/S cold backups to RMAN incomplete recoveries using backup control files.
Oh, and now that I'm retired from teaching, you won't have to worry about me yelling at you.
Thanks for Reading,
Chris Foot
Oracle Ace
Also, you mention a third party book but not its title - I think you were going to provide a link to it but neglected to do so. Is it your book (which I think is excellect - http://www.amazon.com/exec/obidos/redirect?tag=mullinassoci-20&creative=9325&camp=211189&link_code=as2&path=ASIN/0974435538 ) or were you going to reference an Oracle Backup and Recovery book like this one - http://www.amazon.com/exec/obidos/ASIN/0072263172/mullinassoci-20/102-4833761-4300911?%5Fencoding=UTF8&camp=1789&link%5Fcode=xm2
Keep up the great work on this blog Chris... I always enjoy reading it!