Administering RAC Environments - An Interview with RAC expert Scott Rupnik
This provides Scott with a unique skill set this is pretty rare in our profession. In addition, since Scott has a strong background in both database and operating system administration, he is able to provide us with insight on both of these two tightly coupled RAC components.
RAC stands for Real Application Clusters. The optimal RAC environment uses an interconnected set of servers with each server containing one to four CPUs. The servers are connected to a shared disk system using network-attached storage (NAS) or a storage area network (SAN) technologies as the connectivity mechanism. High-speed network connections between the hardware servers themselves and from the hardware servers to the shared disk system allow end-users and administrators to view the environment as single application architecture.
Real Application Cluster enhancements in Oracle10G allow hardware servers to be seamlessly added to an application. Oracle10G enhances Oracle Grid Control functionality to manage RAC environments with a single interface. Oracle's Grid Control is a web-enabled toolset that allows administrators to group hardware platforms, databases, and application server installations and manage them as a single entity. Administrators are able to call other utilities (Data Pump, Transportable Tablespace, Oracle's new job scheduler) from within Grid Control to modify, monitor and tune databases contained in the RAC environment.
Grid Control also simplifies RAC administration by automating the installation, configuration and cloning of Application Server 10G and Database 10G implementations across multiple nodes. Grid Control monitoring views the entire grid as a single unit and provides drill down capabilities to identify problems with individual components.
Question 1
How long have you been working with RAC?
I have been working with RAC environments for about two years now. My last employer was building a very large OLTP application on 9iRAC running on Sun Solaris servers. But, we were only using part of the functionality that RAC provides. We set up multiple instances and used our application server tier to do all of the load balancing and network hardware for failover.
At Remote DBA Experts, building and administering RAC environments is one of the most popular services that we offer. I currently provide primary remote database administration services for several clients that have RAC implementations. These implementations include Oracle 9i, 10.1 and 10.2 releases running on Windows and LINUX. The job is challenging and requires a lot of knowledge in many different areas of technology.
Question 2
RAC seems to be receiving a lot of good trade press lately. What is your personal
opinion of RAC?
It really does provide a lot of benefits. The primary ones are availability and scalability. I'd say RAC, and the Cache Fusion technology that improves its performance, has been a huge success for Oracle. This is especially true with the 10.x releases. With RAC technology and commodity server hardware both rapidly evolving and maturing, RAC is becoming not only a viable alternative for smaller enterprises but also a preferred method of achieving their availability requirements.
Question 3
What is Cache Fusion?
Cache Fusion is essentially a memory-to-memory transfer of data between the
nodes in the RAC environment. Before Cache Fusion, a node was required to write
some of the data to disk before it could be transferred to the next node in
the cluster. Cache Fusion does a straight memory-to-memory transfer. In addition,
each node's SGA has a map of what data is contained in the other node's data
caches.
The performance improvement
is phenomenal. Oracle leverages the vendor's high speed interconnects between
the nodes to achieve the cache-to-cache data transfers. Before Cache Fusion,
when you added a node to the cluster to increase performance of the application,
it didn't always provide you with the performance improvement that you hoped
for. With Cache Fusion, you can easily cost justify the addition of another
node into a RAC cluster to increase the performance of the application running
on it. Oracle sales pitches describe it as 'near linear horizontal scalability'.
Question 4
What are the three greatest benefits that RAC provides?
The three main benefits are availability, scalability, and the ability to use low cost commodity hardware. As I stated earlier, RAC is quickly finding its way into enterprises of all sizes to meet their availability and scalability requirements. RAC provides fault tolerance at a level that was out of reach for many organizations just a few years ago.
Prior to RAC, many organizations would run standbys to minimize outages in case of a hardware or operating system failure. The problem with standby database technology is that you will most likely lose transactional data during a failover. In addition, the failover itself can be time-consuming. RAC overcomes both of those problems.
In a RAC environment, if
a node in the cluster fails, the application continues to run on the surviving
nodes contained in the cluster. If your application is configured correctly,
most users won't even know that the node they were running on became unavailable.
The scalability of a RAC environment goes hand in hand with the ability to utilize
low cost commodity hardware. The x86/x86_64 platforms have achieved phenomenal
price/performance ratios over the last couple of years. The problem was that
they couldn't scale like their big iron proprietary UNIX server competitors.
RAC allows an application to scale vertically, by adding CPU, disk and memory resources to an individual server. But RAC also provides horizontal scalability, which is achieved by adding new nodes into the cluster. RAC also allows an organization to bring these resources online as they are needed. This can save a small or midsize organization a lot of money in the early stages of a project.
The key to achieving this level of scalability is Oracle's Cache Fusion technology, which we just talked about Now that Oracle is able to leverage the hardware vendor's high speed interconnect technology, the cache-to-cache data transfer provides near linear horizontal scalability. If your application needs more horsepower, you add another low cost node into the RAC environment.
Question 5
What areas of RAC do you think need to be improved?
Although Oracle has made great progress in both of these areas over the last few years, I'd say installations and tools. I'd like to see an installation, including Oracle Clusterware, ASM, and the database be performed in an hour or two. I would also like to see the installer be more intelligent.
Oracle is doing a very good job on RAC administration tools. They are maturing pretty quickly. But like all technicians, I am impatient. I want the most robust administration tools I can as quickly as possible. Oracle is certainly making headway.
The new 10.2 Grid Control tool has a robust reporting mechanism, which I think you covered in one of your blogs. Grid Control now provides administrators with a good view into ASM. In addition, the Oracle Clusterware, OCFS2.0, has just been included in the latest LINUX kernel.
I'd like to see Grid Control provide a more robust cluster-wide performance dashboard and an interface for creating and managing services. I think that would help a lot with getting to a true Grid computing environment.
Question 6
What is Clusterware?
One of RAC's requirements is that clustering software (sometimes called clusterware) be used to connect the hardware platforms together. This underlying clustering software was purchased either from the hardware vendor or a third-party clustering software provider. RAC is installed on top of the cluster environment and works in conjunction with the underlying clustering software to allow the application programs to view the multiple instances as a single entity.
One of the problems using previous releases of RAC was identifying exactly whose software it was causing the problem in the first place. Was it the RAC software or was it the clustering software provided by the hardware or third-party vendor? Oracle10G solves this problem by providing its own clustering software called Portable Clusterware. Portable Clusterware can now be used in place of the hardware or third-party vendor's clustering software.
Question 7
How complex is RAC to administer?
RAC certainly bring its own set of challenges with it. You not only have multiple instances to administer but you may also have to administer multiple services. In addition, the DBA is increasingly being called upon to perform operating system and storage administration tasks for RAC environments.
The DBA needs to know more than just the Oracle database to be effective. I have the good fortune of being trained and previously employed as a LINUX and UNIX operating system administrator. That experience provides me with both hardware and operating system administration expertise. I found that knowledge to be invaluable. Operating system, disk storage and database administrators must work together to achieve a successful RAC implementation.
Oracle has been providing new and better tool sets for these tasks, from OCFS and ASM to Grid Control. It is very important for an organization considering a RAC implementation to perform an in-depth analysis to understand and plan for this added complexity early in the design and implementation process.
Question 8
Is Grid Control required for RAC administration?
Technically no, but it's difficult to be effective without it. It's hard enough to troubleshoot enterprise applications let alone RAC implementations that are far more complex. If you have an application that uses multiple instances, each running on a separate node, you can easily spend as much time jumping from one node to another as you do viewing and analyzing diagnostic and performance information.
The Grid Control software provides an overview of the complete cluster. In addition it provides administrators with the ability to drill down into each of the nodes and database instances in the cluster If Grid Control did nothing more than aggregate the information from multiple instances, it would be worth installing. It really provides a lot more features and functionality than reporting. Grid Control facilitates and simplifies RAC administration. You'll be wasting a lot of precious time if you use the command line interface.
Question 9
What is the most challenging aspect of RAC?
There are really two areas that make RAC a challenging environment to administer. First, there are a lot more moving parts. You need to understand the shared storage environment and the cluster layer to be able to troubleshoot them. You need to know where the cluster logs are, how to execute FSCK on an OCFS drive, etc..
Second, you need to realize that you have multiple instances to administer. In order to create an environment that provides the highest availability possible, you will have to implement Transparent Application Failover (TAF).
TAF automatically transfers
application user connections to surviving nodes when their primary node, the
node they were connected to, fails. Depending on how you configure TAF, users
won't even know that their connection and work have been transferred to another
server.
Question 10
How much operating system expertise is needed?
RAC is fundamentally just a clustering solution, albeit a highly specialized and tuned one. Even if you're lucky enough to have operating system and/or SAN administrators to help you, you'll still be required to have a basic understanding of all of the components. You must be able to understand what your operating system and disk storage administrators are talking about and what questions you need to ask during the installation and configuration process.
Question 11
What operating system do you prefer for RAC?
The most popular environment of our current clients is most definitely LINUX. We have several RHEL3 clients and have recently been adding RHEL4 to the mix. I definitely prefer LINUX and UNIX over Windows, and am happy with RHEL3-4, but I'm looking forward to getting a test RAC environment set up on Solaris 10 now that Oracle has indicated that Sun, once again, is one of their preferred hardware vendors.
Question 12
Can RAC be installed by an experienced DBA with no previous RAC experience?
Sure, but a traditional DBA will face a steep learning curve. It's important that the DBA have a fundamental understanding of all of the different components that comprise a clustered environment. Your success will also depend on the support staff you have available to you as well. If you're lucky enough to have a SAN and operating system administrator that understands clustering, it's a lot easier. If not, be prepared to do a lot of research up front to ensure that you understand all of the technologies involved and how they interface with each other.
It is certainly a lot easier if you have someone to help jumpstart your knowledge in this environment. That is why our RAC implementation and administration offerings are so popular. Our customers realize that RAC requires a unique knowledge set that combines database, disk storage and operating system administration expertise. We have people on staff that have extensive experience in all three of those disciplines. It makes the installation and administration of RAC environments easy for us. In addition, this expertise helps us when we are required to perform tuning and problem determination analysis, which is more complex in RAC environments.
Question 13
If you could provide advice to someone just getting started in RAC administration,
what would it be?
The recommendations at the top of my list are to use the tools Oracle provides, use the Internet to learn as much as you can about RAC administration and don't be afraid to learn how the operating and disk storage systems work. If you don't have the experience or expertise, call in someone to help that does.
The Grid Control product is the best tool available for viewing your systems at the cluster level. You must install the product and learn how to use it. There are a few administration tasks that are better suited to using a command line interface but the Grid Control tool should be your first choice when administering and troubleshooting a RAC environment.
The Internet, both official Oracle sites and sites like this one, contain a wealth of information on RAC installation, configuration, and administration. There are detailed guides to help you with most of the tasks you will be required to perform. Be sure to read as many Oracle Best Practices documents as you can. These will help you to solve problems in the planning stages of your project. The earlier you find potential issues during the project the better off you will be. You certainly don't want to find out that you missed a critical configuration step after you implement a RAC environment in production. I will state that RAC can be unforgiving at times.
Finally, depending on the
size and staffing of your organization, you must involve personnel that have
operating system and disk storage administration expertise. The cluster and
shared storage are the foundation of RAC, you need to understand and be comfortable
with both of them.
Question 14
What do you think the future will be for RAC?
RAC has gained a solid foothold and will be around for many years to come. Obviously the individual pieces, both the cluster services and shared storage systems, will get more mature and become more efficient. I also look for a more intelligent Grid Control product that will have the ability to not only monitor resources but also intelligently reassign them within a large enterprise GRID containing many RAC databases.
RAC is the foundation for Grid, which Oracle views as the future of enterprise computing. Instead of connecting multiple nodes together to support one, individual application, you connect multiple nodes together to support many different applications. Whether Grid becomes a popular implementation is immaterial, as RAC continues to mature it can only gain in popularity.
_____
tags: