Oracle Application Server Tips by Burleson Consulting

In a nutshell, high availability means that all Oracle Application Server 10g components are readily available to the end-user community. However, keep in mind that high availability is a relative term, and there is a direct trade-off between computing resources and availability.

If your system must have continuous availability, even in the face of a disaster, then expensive (more resource intensive) failover mechanisms must be implemented in your application, and additional failover servers are required. If your system can tolerate an occasional downtime, then less aggressive failover techniques can be used.

There is also a trade-off between recovery time and expense. Since Oracle introduced recovery products 12 years ago, their technologies have evolved significantly. The options range from recovery that can take hours to true continuous availability. At the database level these techniques range from traditional database recovery, to standby databases, all the way to Real Application Clusters. As discussed in Chapter 2, the infrastructure repository database can be created in an existing Oracle 9i database (and once certified, 10g database) to include one using RAC to provide that continuous availability.

In this chapter we will focus on the high availability features of Application Server 10g. For information on high availability options for Oracle back-end databases, refer to the books Oracle Database 10g DBA Handbook by Kevin Loney and Bob Bryla (McGraw-Hill/Osborne Media, 2004) or Oracle Database 10g High Availability with RAC, Flashback, and Data Guard by Matthew Hart and Scott Jesse (McGraw-Hill/Osborne Media, 2004).

We will start the discussion with the application server overall and then work down to the high availability features of the separate components.

Why Are Systems Unavailable?

First, what is considered system availability? A system is available if it accepts and processes end-user requests. Basically, if an end user cannot get on the system, it is unavailable. Systems become unavailable for three main reasons: application failure, hardware failure, or maintenance. Hardware failure is less and less the reason for systems being unavailable. Most large multiprocessor UNIX computers, for example, are fault tolerant and will bypass bad memory or a failed CPU until it is repaired. Disk drives have become very reliable, and disk arrays can be configured to tolerate multiple drives that do fail. A disk array that is striped and triple mirrored has a mean time between failures measured in decades. However, Oracle?s Application Server 10g does not require large computers and is designed to run on low-cost commodity servers. Computers that are not fault tolerant must be protected with redundancy. Since the computer cannot detect errors and work around them, it is up to the software to determine that a computer has failed and take the necessary actions to complete the user transaction.

Application failure occurs when a request causes the application to fail. This failure includes anything from human errors that crash a system, to program exceptions that crash the application (or the OC4J container). Oracle Corporation data shows that 75 percent of Oracle outages are the result of human error. In this case the application server must be able to detect components that have failed and restart them.

 Lastly there is maintenance downtime. This one is tricky. Failure to properly back up the application server to include the Metadata Repository database repository is to risk losing the entire system and having to reinstall the application server and your application. Included in maintenance downtime is the requirement to update your application or the application server itself. However, with a properly configured application server, you will be able to maintain 24/7 availability to your end users while taking the necessary precautions to safeguard your system.

Fortunately, one of the real fortes for Oracle Application Server 10g is its ability to eliminate downtime. In this chapter we will start with clustering application server instances to eliminate single points of failure and then dig down to the abilities of the individual components, like OC4J?s ability to upgrade or deploy your application while it is running.

Eliminating Single Points of Failure

Since Oracle Application Server 10g achieves high performance on low-cost commodity servers, you do not need to invest in large, expense hardware with built-in fault tolerance. At much lower cost, you can add low-cost servers to create a clustered architecture.

The diagram in Figure 9-1 shows a basic architecture that contains complete redundancy from the Internet and back to the database using Real Application Clusters.

Fig 1 Typeset text:

Web Cache




Hardware Cluster

Infrastructure Tier

Database Tier

Real Application Clusters

Figure 1: Application server architecture with complete redundancy

One important note: you must also create a redundant network infrastructure within the Oracle Application Server 10g infrastructure to eliminate all single points of failure. If all the network connections between the midtiers and the back-end database connect through one switch, that switch is a single point of failure that can bring down the entire infrastructure.

Let?s start at the Internet/intranet connection and work toward the back-end database.

Web Cache Tier

Redundancy begins with multiple connections to the Internet (or intranet) attached to a pair of clustered Web Cache servers. As we discussed in Chapter 5, multiple Web Cache instances can be clustered so that they not only share cached content, but also provide failover and load balancing to multiple midtier application server instances. The Web Cache also has the ability to respond to a request for content that is contained in its cache, even if the midtier application servers are temporarily unavailable. As you learned in Chapter 5, the Web Cache pings each of the OHS servers to insure that it does not send a request to an OHS instance that is down. If you do not have multiple connections to the network, a single Web Cache instance can support and load-balance multiple midtier instances. However, in this case you will have two single points of failure?the Web Cache server and the router/switch in front of the Web Cache server.

