 |
|
High Availability
Oracle Application Server Tips by Burleson
Consulting |
In a nutshell, high availability means that
all Oracle Application Server 10g components are readily available
to the end-user community. However, keep in mind that high
availability is a relative term, and there is a direct trade-off
between computing resources and availability.
If your system must have continuous
availability, even in the face of a disaster, then expensive (more
resource intensive) failover mechanisms must be implemented in your
application, and additional failover servers are required. If your
system can tolerate an occasional downtime, then less aggressive
failover techniques can be used.
There is also a trade-off between
recovery time and expense. Since Oracle introduced recovery products
12 years ago, their technologies have evolved significantly. The
options range from recovery that can take hours to true continuous
availability. At the database level these techniques range from
traditional database recovery, to standby databases, all the way to
Real Application Clusters. As discussed in Chapter 2, the
infrastructure repository database can be created in an existing
Oracle 9i database (and once certified, 10g database) to include one
using RAC to provide that continuous availability.
In this chapter we will focus on the
high availability features of Application Server 10g. For
information on high availability options for Oracle back-end
databases, refer to the books Oracle Database 10g DBA Handbook by
Kevin Loney and Bob Bryla (McGraw-Hill/Osborne Media, 2004) or
Oracle Database 10g High Availability with RAC, Flashback, and Data
Guard by Matthew Hart and Scott Jesse (McGraw-Hill/Osborne Media,
2004).
We will start the discussion with the
application server overall and then work down to the high
availability features of the separate components.
Why Are Systems Unavailable?
First, what is considered system
availability? A system is available if it accepts and processes
end-user requests. Basically, if an end user cannot get on the
system, it is unavailable. Systems become unavailable for three main
reasons: application failure, hardware failure, or maintenance.
Hardware failure is less and less the reason for systems being
unavailable. Most large multiprocessor UNIX computers, for example,
are fault tolerant and will bypass bad memory or a failed CPU until
it is repaired. Disk drives have become very reliable, and disk
arrays can be configured to tolerate multiple drives that do fail. A
disk array that is striped and triple mirrored has a mean time
between failures measured in decades. However, Oracle?s Application
Server 10g does not require large computers and is designed to run
on low-cost commodity servers. Computers that are not fault tolerant
must be protected with redundancy. Since the computer cannot detect
errors and work around them, it is up to the software to determine
that a computer has failed and take the necessary actions to
complete the user transaction.
Application failure occurs when a
request causes the application to fail. This failure includes
anything from human errors that crash a system, to program
exceptions that crash the application (or the OC4J container).
Oracle Corporation data shows that 75 percent of Oracle outages are
the result of human error. In this case the application server must
be able to detect components that have failed and restart them.
Lastly there is maintenance downtime.
This one is tricky. Failure to properly back up the application
server to include the Metadata Repository database repository is to
risk losing the entire system and having to reinstall the
application server and your application. Included in maintenance
downtime is the requirement to update your application or the
application server itself. However, with a properly configured
application server, you will be able to maintain 24/7 availability
to your end users while taking the necessary precautions to
safeguard your system.
Fortunately, one of the real fortes
for Oracle Application Server 10g is its ability to eliminate
downtime. In this chapter we will start with clustering application
server instances to eliminate single points of failure and then dig
down to the abilities of the individual components, like OC4J?s
ability to upgrade or deploy your application while it is running.
Eliminating Single Points of Failure
Since Oracle Application Server 10g
achieves high performance on low-cost commodity servers, you do not
need to invest in large, expense hardware with built-in fault
tolerance. At much lower cost, you can add low-cost servers to
create a clustered architecture.
The diagram in Figure 9-1 shows a
basic architecture that contains complete redundancy from the
Internet and back to the database using Real Application Clusters.
Fig 1 Typeset text:
Web Cache
Internet
Cluster
Mid-tier
Hardware Cluster
Infrastructure Tier
Database Tier
Real Application Clusters
Figure 1: Application server architecture
with complete redundancy
One important note: you must also
create a redundant network infrastructure within the Oracle
Application Server 10g infrastructure to eliminate all single points
of failure. If all the network connections between the midtiers and
the back-end database connect through one switch, that switch is a
single point of failure that can bring down the entire
infrastructure.
Let?s start at the Internet/intranet
connection and work toward the back-end database.
Web Cache Tier
Redundancy begins with multiple
connections to the Internet (or intranet) attached to a pair of
clustered Web Cache servers. As we discussed in Chapter 5, multiple
Web Cache instances can be clustered so that they not only share
cached content, but also provide failover and load balancing to
multiple midtier application server instances. The Web Cache also
has the ability to respond to a request for content that is
contained in its cache, even if the midtier application servers are
temporarily unavailable. As you learned in Chapter 5, the Web Cache
pings each of the OHS servers to insure that it does not send a
request to an OHS instance that is down. If you do not have multiple
connections to the network, a single Web Cache instance can support
and load-balance multiple midtier instances. However, in this case
you will have two single points of failure?the Web Cache server and
the router/switch in front of the Web Cache server.
This is an excerpt from "Oracle
10g Application Server Administration Handbook" by Don Burleson
and John Garmany.