Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Blogs » Chris Foot Blog » Chris Foot's Oracle10g Blog » The Non-Technical Art of Being a Successful DBA – Paranoid DBA Best Practices
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3623
 

The Non-Technical Art of Being a Successful DBA – Paranoid DBA Best Practices The Non-Technical Art of Being a Successful DBA – Paranoid DBA Best Practices

Ever look at a screen’s output and get that puckered feeling in the pit of your stomach? If you have been working in this profession for any amount of time, you know the feeling I’m talking about. The feeling that makes you think you would rather be living in Montana making woodcarvings at a roadside stand than being a DBA. I’ll be taking a somewhat lighthearted look at the perils of our profession and discuss ways to reduce problem occurrences.

The Perils of our Profession
One of the common challenges that all DBAs face, no matter what vendor's database they work on, is the absolute attention to detail our profession demands. Switch a couple of characters in a script, forget to set your SID, set the wrong flag at the wrong time and the end result usually isn't very pretty. Many commands we issue on a regular basis are destructive by their very nature. This is the reason why I have a great respect for all technicians who have selected database administration as their chosen profession.

I know they have all experienced that uncontrolled "eye-twitching" at 2 AM when they are ready to hit the final enter key to execute the command. You know what command I'm talking about too. It's that one command that you really, really, really hope is going to come back with a successful return code and ultimately end with a database that is finally usable. Whether it's a recovery, a file fix or corrupt data is immaterial, it's the wait that we are talking about.

There is no longer wait in our profession than waiting for the message below after a database recovery:

SQL> Database opened.

Time always seems to stand still. The longer the recovery, the messier the recovery, the more critical the database - the longer you wait. You stare at the screen hoping beyond hope that the above message will appear. It's the ritual cross your fingers, spin around three times, face towards Oracle headquarters and pray to everything that is Larry Ellison wait. I've actually caught myself mumbling "come on, please open, please open…" I don't care how sure you are of your capabilities - you know the anticipation I'm talking about.

You then either breathe a sigh of relief or you are in absolute disgust when you see an Oracle error message appear. How about the old "File 1 needs more recovery to be consistent" or the "File 2 not restored from a sufficiently old backup"? Those messages are enough to make anyone cringe. I'm an ex-Oracle instructor, I've seen those messages A LOT in class. I still cringe.

At a previous job, I once had to run through 36 hours of tapes to restore a multi-terabyte warehouse. A disaster occurred that required us to do a recovery. THAT was the longest wait for a database open message I ever experienced. One of my fellow DBAs asked if I needed a brown paper bag to breathe into.

Or it's the command that drops the schema in the test environment that will allow you to do a refresh from production. It's that test database that runs on the same box as production. The environment that makes you do a "SELECT name FROM V$DATABASE" command in SQL*PLUS about 15 times in a row before you finally execute the "DROP USER CASCADE" statement.

Not only must we try to prevent our own mistakes, we must safeguard our environments against the mistakes of others. Operating system administrators, disk storage technicians and application developers are just like us. We are all part of the human community that makes mistakes from time to time.

If you never make mistakes, send me a resume. I'm always looking for a "Patron Saint of Oracle" here at Remote DBA Experts. It will also save us on travel costs because I'm sure you'll be able to spread your wings and fly here on your own.

But as my old boss Dan Pizzica used to tell me (when I was a VERY junior DBA) "It really doesn't make a difference who broke the database. You are the technician who is ultimately responsible for fixing it. The buck stops with you. If you can't protect your environments, you aren't doing your job." We all know he's absolutely correct.

Then there's the software glitches. The problems that pop up out of the blue and make you go:

"WHAT THE ? - how did THAT happen? I've done this 317 times in a row and it worked every time."

For you math majors, here's my calculation for this:

  CLOSER YOU ARE TO PRODUCTION TURNOVER
+
THE GREATER THE VISIBILITY OF THE PROJECT
=
THE MORE LIKELY A PREVIOUSLY UNKNOWN SOFTWARE GLITCH WILL OCCUR

I don't care what software you are using, you will run into the "only occurs on this release, on this version of the operating system, using this particular feature on the third Tuesday of the sixth month when it's cloudy outside" BUG. Be sure to expect management to stop by and ask "well why didn't you test this on the third Tuesday of the sixth month when it was cloudy outside?"

The more complex the database ecosystem, the more paranoid I become. Which is why I'm not a follower of "the database is getting so easy - we won't need DBAs" mantra that mindless industry pundits profess on a seemingly endless basis.

So now we know that our jobs are somewhat unforgiving and we do make a mistake from time to time. What can we do to reduce the chance of an error occurring?

The Second Set of Eyes
As I have stated in previous blogs, I have over 20 years of experience using Oracle and have done my fair share of database backups and recoveries. During my career as an Oracle instructor, I have assisted in hundreds of database recoveries in Oracle's classroom environments. If possible, I still have others review my recovery strategy and recovery steps before I begin the recovery process. I used backup and recovery just as an example. Whatever the process is you are are performing, a second opinion may prevent you from making a mistake. A review from a fellow DBA has saved me more than once. I may be described as having an ego (I have no idea where they get that opinion) but it doesn't prevent me from asking for help from others.

We were recently correcting a very poor backup script created by a customer's previous database support vendor. The customer described this particular environment as "if it goes down we lose our ability to make money" application. After our massive set of changes were complete, two of us went line-by-line verifying each line of the backup script. At the end of each script we asked each other "are you OK with this?". Only then did we move on to the next one. I don't care how much time you have "in the seat" using Oracle, you need to put your ego aside at times and have someone check your work on critical activities.

Concentration
I used to work for a shop that subscribed to "the everybody in one big room" philosophy. I guess it was supposed to allow everyone to work together as a team and become as "one with each other". It may have achieved that purpose but it sure didn't allow you to concentrate on your work very well. You could hear so many different conversations they had to pump white noise in. The constant 'whhhsssssshhhssshhh" noise made me feel like I was a crewmember of the Starship Enterprise. I always wondered when Captain Picard would walk through.

Like all DBA units, our particular area was often populated with various developers and O/S technicians. Many different conversations were occurring, some that could be described as somewhat animated. The environment did not allow you to concentrate on the task at hand. We often had to go into small conference rooms to work on critical tasks.

The point I'm trying to make is that no matter what type of environment you work in; if you can concentrate OK, but if you are like me and you can't, find a spot where you can. Block off some time, send questions to other DBAs and concentrate on the task at hand. Don't attempt to answer questions and code a complex script at the same time. May seem obvious, but throughout my career I have personally watched numerous DBAs attempt to multitask when they are working on a critical process. It's a recipe for a problem. Once you are done, follow rule number one and have someone review your work.

What Database Are You Working IN?
Working in the wrong database is a common problem for database experts as well as their less experienced counterparts. How many times have YOU found yourself running statements in the wrong environment? Feel free to include me in that not so select group. The operating system command SET can be used in Windows systems to display environment variables. The ENV command can be used to display the environment variables in UNIX. Many seasoned database administrators change their UNIX shell prompt in their.profile to display the current Oracle SID. Displaying the current Oracle SID in the shell's prompt provides a continuous reminder to the DBA of the database they are working in.

The entire paragraph above was from a recent blog I wrote on the new capabilities of 10G SQL*PLUS. I highly recommend you read this blog and use the 10G Version of SQL*PLUS to take advantage of this safeguard.

Setting your SID Automatically During Logon
Here's an easy recommendation - don't do it. Many of the servers we work on contain several databases that use different database software releases. It is our standard to not set environments in logon scripts. We create shell scripts that have the database name in their title to set our SID, ORACLE_HOME, etc..

It is a little harder in Windows environments. During installation, Oracle often uses the first database created on the server as the default SID. You can override this configuration, but you'll need to read the Oracle installation guides to find out how. The manuals will show you how to override the default database on Windows operating systems.

Saving Time VS Creating a Problem
I once watched a fellow DBA perform a rather complex set of administrative tasks to solve a problem. He was rapidly flipping back and forth between at least 15 active screens, copying and pasting and editing and copying and pasting and editing… I describe this particular activity as "Multiple Screen Syndrome". He also had several other screens open that were connected to other databases. He was multi-tasking to its highest degree.

How about the rm -r /u0*/ora*/prod*/*/*.* command in UNIX? It's the command that drops multiple databases in multiple directories. All in one painful swoop. How many times have you heard of a mistake caused by commands like this causing mass mayhem? When you make a mistake like this, you become immortalized in conversations for years to come. Get a few technicians together after work and ultimately the conversation will include "remember when Bob so-and-so ran that big rm -r command by mistake and wiped out the entire O/S on our production web server? You can't tell me you haven't heard stories like this.

My opinion as a database manager is that I would rather you take your time than showcase your multi-tasking and time saving skills. The more complex and critical the activity, the more basic you should become in your plan of attack.

Safety First Mindset
I once saw a DBA log in to a database using a particular schema account. He then logged into a different database using a DBA account and dropped the schema with the same name he was logged into on the first database. I asked him why he logged into the first database using the schema account he just dropped in the second. He stated "Oracle won't let you drop a schema that is connected. No matter what happens after this, I'm positive that I won't drop the user in this database by mistake." I like that Safety First mindset in a DBA.

You need to think Safety First when you are performing any particular complex or critical activity. Take the time and put one or two safeguards in place like the DBA did when he dropped the user.

Other DBAs may call you paranoid, I'll call you an experienced DBA that would rather be safe than sorry.

Wrapup
The intent of this blog was to not provide you with a laundry list of recommendations. It was intended to help jumpstart your creative juices to think about different methods to protect yourself against problems. If you have any helpful hints, please feel free to respond to this blog with your Safety First Tips and Tricks.

Thanks for Reading,

Chris Foot
Oracle Ace


Monday, September 18, 2006  |  Permalink |  Comments (2)
trackback URL:   http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2006-09-17.9516775168/sbtrackback

SID paranoia

Posted by bbreidenbach at 2006-09-21 12:31 PM
Most of my shell scripts (especially any with destructive capabilities) require the SID to be passed to them or they error out with a syntax help message. A junior dba told me he'd really prefer to have the scripts just pull $ORACLE_SID if one wasn't supplied -- and could I do that for him from now on? (I think he thought I didn't know how....) As a fellow paranoid, you can guess my answer. ;-)
 

Powered by Plone