Java vs. PL/SQL: Where Do I Put the SQL?
Many articles have been written about the use of PL/SQL vs. Java. These articles tend to focus on the intricacies of each language and their strengths and weaknesses as the server-side logic. This article won't focus on this issue but will instead concentrate on an issue that faces Java and PL/SQL developers alike -- where do you put the SQL? The SQL is a major part of each application, and its location can greatly enhance or degrade performance, distribution, and maintenance.
Java is certainly mainstream at this point and is a very powerful language. Even though it's a personal favorite of this author, it still has its disadvantages. PL/SQL, while obviously database-centric (Oracle), also has some enormous benefits. Good developers use both where appropriate and often find themselves perplexed with the same issue -- the location of the SQL statements.
When I say "the location of the SQL," I'm referring to its storage location. The developer has many options when writing Java applications, servlets, and applets that access databases. You can store SQL in the application itself (Java side) and access the database directly from the applet via a SQL statement. Or the programmer can make a request from the applet to the servlet, which then sends the SQL to the database. Another option within Java is to simply make a procedure or function call (Callable Statement in JDBC Driver) and process the result set. In the last scenario, the SQL would reside inside the database in an Oracle package. This article explores the advantages and disadvantages of these options and includes performance benchmarks. It's my intention to objectively answer my own questions while helping to make your decision easier when you encounter this issue in your own projects.
The Power of a Package
If you have the option of installing a PL/SQL package as part of your application, you should seriously consider installing it. (Although the focus of this article is on PL/SQL, the package could also be written in Java in versions of Oracle 8.1 and later.) PL/SQL packages have many advantages:
- Privilege Management-Instead of being concerned about whether each user has the rights to perform a function and trapping exceptions throughout your code, you can grant execute on a package. The user inherits rights to all of the underlying objects indirectly through package execution. For example, let's assume that part of your code issued a TRUNCATE command on a table. If the command is issued as the connected user, you can expect privilege problems and would need to resolve these problems by granting the proper privileges to the user. You would either have to connect as someone with privileges behind the scenes or put the TRUNCATE command in a procedure call in the package. The procedure becomes the gatekeeper of the transaction and ensures that any access to the underlying objects goes through the procedure.
- Global Location-Having the SQL in one spot is the most flexible option: By having calc_inventory() as a procedure call, any application that can issue a database call can benefit from the procedure. Your Java apps, applications, Oracle*Forms, Visual Basic, or any application that can issue a SQL statement can easily access the same result because it's the same code that produces the result. This solution is preferable to reproducing the same algorithm in different code bases maintained by different people. And it surpasses trying to access methods in one language from the architecture of another by leveraging APIs and RPCs-a strategy that's more complex to build and maintain. Put the SQL in the database and be done with it.
- Performance-Most of the time, performance will be the driving issue in the decision of how to partition an application. Very few users are interested in elegant architectures when their query takes five minutes to return after they clicked a button. The fact is that you can't run a SQL statement any faster than running it inside the Oracle kernel. But at what point is performance really an issue? Is a package faster when submitting a single SQL statement? Or is it only when submitting two or more that a package is faster? When is it clearly the best solution for performance reasons? I'll show later in this article that PL/SQL is faster with some tasks and slower with others.
- Interface Agnostic-Software architects learned long ago that it makes sense to separate back-end logic from the interface. Interface technologies change much more frequently than databases. Whether it's Java Swing, AWT, Visual Basic, PowerBuilder, or Oracle*Forms, a well-designed application can support them all. Decouple the business logic from the user interface controls and you're well on your way.
- Global Variables-Global variables are useful, and yes, Java has static variables as well. The difference is that global package variables apply to anyone who accesses them regardless of the application. This allows you to maintain data across and between transactions through persistent data. A user in SQL*Plus, JDBC, Java, Visual Basic, and ODBC will inherit the performance benefits. The following code shows an example since it only calculates the global cost when the global cost variable hasn't been set (=0).
Function x returns VARCHAR2
BEGIN
If global_cost = 0 then -- global_cost is
defined outside the scope
Calc_global_cost(); -- (first time in only)
endif;
END
The Flexibility of Java
Java also has some benefits as a home for your Oracle SQL statements. These benefits include:
- Simplicity-Embedding SQL statements in native Java code is much easier than building additional pieces, like PL/SQL packages.
- Debugging-Although tools exist that enable the debugging of PL/SQL, they're less robust than the functionality in existing Java IDEs, like JBuilder. Being able to set breakpoints and inspect variable values is a critical requirement for any developer, and it's preferable to do that from a single development environment.
- Distribution-Having the code encapsulated in one place makes the distribution and management of the application much easier. If a user can simply access a URL and use your application instantly, then you have a satisfied user. If you create a package in the database, you'll have to install that package in each database to be accessed. By keeping the SQL in the Java code, you can connect to any database and don't have to worry about PL/SQL package maintenance on each node.
- Performance-When performing many inserts or updates, Java performs better because it has the ability to batch these statements. The JDBC driver provides the ability to queue the request and when the number of queued requests reaches the batch size, JDBC sends them to the database for execution.
Performance
At first glance, the main performance degradation for a SQL request is sending and receiving the request across the network. We also know that most boosts in performance rely on the architect eliminating or reducing the number of round trips to the database.
A logical assumption is that batching requests and then sending them as a group would greatly enhance performance. But how much does it enhance performance? And what's the threshold that seems to be the decision point? Is a single statement faster in a package? Two statements? Or is it only when you have 25-50 statements that a package becomes the ideal choice? These are the questions that I hope to answer in the benchmark tests in the following section.
Benchmarks
The timings for these benchmark tests are given in milliseconds and are calculated by making the call System.currentTimeMillis() in the Java code. Each test was performed 10 times, and the average result is reported. Each test was performed locally, over an internal network, and remotely, over a broadband connection (DSL).
Environment
Database: Oracle 8.1.5 on HP-UX B.11.00 A 9000/785
JDBC Driver: Oracle Thin 8.1.6.0
Client machine: Pentium II NEC Laptop 366 MHz with 256MB RAM
Java Virtual Machine: 1.3.0_01
The Tests
My test scenarios consisted of timing the particular statements during their execution only. I tried to eliminate the definition of the statement as well as processing and closing of any result sets. You'll notice in the Java code that the clock is started right before the execute statement and stopped after it returns.
I also used prepared statements in my tests instead of regular statements. It's more efficient to use PreparedStatement with bind variables for frequently executed statements. Although PreparedStatement is inherited from Statement, it's different in the following two ways:
- Each time you execute a Statement object, its SQL statement is compiled. However, when you execute a PreparedStatement, its SQL statement is only compiled when you first prepare the PreparedStatement.
- You can specify parameters in the PreparedStatement SQL string, but not in a Statement SQL string. Single statement:
The single statement test is a very simple test. I tried to create a statement that everyone could run on their machines, and one that wouldn't be answered immediately to ensure that the time reported wasn't solely network communication time. At the time I ran this query, I had 3,194 objects reported in DBA_OBJECTS (see Listing 1).
Java:SQLText = "select count(*) from dba_objects"; pstmt = databaseConnection.prepareStatement(SQLText); startTime = System.currentTimeMillis(); ResultSet rs = pstmt.executeQuery(); rs.next(); x = rs.getString(1);PL/SQL:
FUNCTION single_statement RETURN VARCHAR IS row_count PLS_INTEGER := 0; BEGIN select count(*) into row_count from dba_objects; return row_count; END single_statement;
Listing 1: The single statement test.
Multiple Statements
The multiple statement test was a bit more difficult. At first, I used 10 different queries. Later I decided to use the same query 10 times and place it in a loop instead of coding it 10 times (see Listing 2).
Java:
SQLText = "select count(*) from dba_objects"; pstmt = databaseConnection.prepareStatement(SQLText); ResultSet rs = null; startTime = System.currentTimeMillis(); while (multiCount < 10) { rs = pstmt.executeQuery(); /* don't include result set processing in the timings since we do not do it in the PL/SQL rs.next(); x = rs.getString(1); */ multiCount ++; } multiStatementJava = multiStatementJava + System.currentTimeMillis() - startTime;
PL/SQL:
FUNCTION multiple_statements RETURN VARCHAR IS row_count PLS_INTEGER := 0; num_objects VARCHAR2(20); BEGIN WHILE row_count < 10 LOOP select count(*) into num_objects from dba_objects; row_count := row_count + 1; END LOOP; return row_count; END multiple_statements;
Listing 2: The multiple statement test.
Truncate
The truncate test simply truncates a table. No result set is involved. I included this test to observe the benchmarks when no rows are returned (see Listing 3).
Java:
SQLText = "TRUNCATE TABLE SOOTHSAYER.BMC$PKK_INSTANCE_STATS"; pstmt = databaseConnection.prepareStatement(SQLText); startTime = System.currentTimeMillis(); ResultSet rs = pstmt.executeQuery(); truncateJava = truncateJava + System.currentTimeMillis() - startTime;
PL/SQL:
Procedure truncate_table IS trunc_command varchar2(100); BEGIN trunc_command := 'TRUNCATE TABLE BMC$PKK_INSTANCE_STATS'; execute immediate (trunc_command); END truncate_table; Java Oracle package
Listing 3: The truncate test.
Benchmark Results
The following table shows the average local test results, in milliseconds, for each type of test:
Java | Oracle package | |
DB on HP machine | ||
Single statement | 47 | 48 |
Multiple statements | 448 | 376 |
TRUNCATE | 88 | 82 |
Single Statement Results
The single statement test shows nearly equal results after 10 executions. The difference of 1 ms seems negligible (although the remote test provides a very different result). I was a bit surprised at how close these two results were, so I decided to see how much time was spent going to the server and not executing the SQL statement. I did this by commenting out the one line of work in the function:
FUNCTION single_statement RETURN VARCHAR
IS
row_count PLS_INTEGER := 0;
BEGIN
-- select count(*) into row_count from dba_objects;
return row_count;
END get_row_count;
It took an average of 8 ms (~16% of total execution time) to access the procedure and return, without actually executing the statement. This result tells me that each trip I can eliminate will save 8 ms.
Multiple Statements Results
The results of the multiple statement test confirm my assumption. The Oracle package is faster because it makes one trip to the database, does its work, and then returns. This result clearly shows that a package should be used when the complete unit of work can be performed on the database. Larger units of work will result in more significant performance gains.
To determine whether any overhead was incurred because the procedure was inside a package, I eliminated the package and created a stand-alone procedure. The results showed no impact; the numbers were the same.
Truncate Results
The results of the truncate test were surprising-the execution is actually faster in Java than in PL/SQL. Oracle must be eliminating some overhead in the JDBC driver that isn't eliminated in PL/SQL.
Remote Results
The following table shows the average remote test results, in milliseconds, for each type of test:
HP machine via broadband connection (DSL) | Java | Oracle package |
Single statement | 286 | 113 |
Multiple statements | 1662 | 506 |
TRUNCATE | 217 | 332 |
The test results from the broadband connection are revealing:
- A 1 ms difference in the single statement test equates to 173 ms with a slow connection speed.
- The results of the multiple statement test are overwhelming, showing that the PL/SQL package is three times faster.
- Once again, the TRUNCATE command is faster in Java than in PL/SQL.
Although we might easily discount a difference of a few milliseconds as being "close enough," as in the first single statement test, the remote test shows that we should always consider the faster approach. A query that executes only 5 ms slower in Java than in a PL/SQL package (or vice versa) might not seem like an issue. However, if the query is executed 5,000 times per day, that difference affects performance considerably. Also consider that the difference of 5 ms might occur when you're testing the code at work on a T1 line with the server three feet from your desk. But when you test the code over an ISDN, cable, or DSL connection, that 5 ms can become 55 ms or 300 ms. Every millisecond counts when performance tuning.
Conclusion
It's hard to declare a clear winner in this topic. Many factors demand a combination of strategies. The ultimate decision should weigh the following factors and their applicability to the application:
- Unit of Work (UOW)-If the UOW is one SQL statement, then creating a function solely for it makes little sense. However, if the unit of work is a series of SQL statements with processing in between, a package might provide the best solution. This solution assumes that:
- the Graphical User Interface doesn't need to be informed of the status of the work, as is typical in a progress bar of a GUI control; and
- the database can perform all processing required.
- Network Speed-We witnessed the impact of running the same program over the internal network vs. a DSL connection. If network speed becomes an issue, the use of the package is preferable.
- Database Accessibility-If the application gives the option to connect to any database or a large number of databases, having the code in the Java eliminates the distribution and maintenance of the package. All of the code used in the tests is available in the Source Code file at www.oracleprofessionalnewsletter.com and can be used as a template to test your SQL statements in your environment.
--
Dave Moore is a product architect at BMC Software in Austin, TX. He's also a Java and PL/SQL developer, Oracle DBA and author of Oracle Utilities by Rampant Tech Press.
Contributors : Dave Moore
Last modified 2005-05-25 01:58 PM
Where to put SQL
I have recently been working with apps that require many, connections, many databases and many types of databases all at the same time and think I have found a reasonable way to manage it.
The method is to use the database as a resource to hold the SQL for the query just like you would use a resource file for internationalization. This allows you to change SQL without a re-compile just like a stored proc. The speed is the same as far as I can see. I find this better than a stored proc because the SQL in the database can be for any other database on some other server that would never talk through a stored proc; like SQL Server to MySQL or MySQL to SQLite. It also allows you to compile database dialects and have an script for each type. Want to switch to MySQL? Reload with this set of SQL statements. Using Postgres? No problem.
It seems like a no-brainer to me but people still hang onto stored procs and prepared statements as if the database was actually generating native code. It's, at best, byte code that needs to be interpreted. Do that for each prepare statement and you'll be slower. SQL resources is the way to go!