Data center architects naturally seek to employ server virtualization to maximize the use of their hardware systems. However, one factor – often overlooked – carries real potential to undermine this goal. That factor is data connectivity. This article examines the importance of data connectivity in a virtualized environment, and the need to take an intelligent approach to data access to truly reap the benefits of your virtualization strategy.
As strides have been made over the years in database optimization and the performance of processors and other hardware-based server components, the performance bottleneck has moved to the database middleware – the software drivers that provide connectivity between applications and databases. Between 75% to 95% of the response time now associated with database access can often be attributed to the data connectivity layer – and that’s using traditional non-virtualized servers. Running multiple virtual servers on a single machine can introduce additional complications involving data access.
Old Problems Become New Again
Exponential improvements in processor speed and design, continual strides in network capacity, and commoditized memory together promised to make hardware resource contention a thing of the past. However, with new capacity come new applications and new uses for information technology. In reality, the demand for applications is actually outstripping the ability of hardware improvements to accommodate them. That’s one reason why the number of x86 servers is projected to grow – according to IT research firm IDC – 39% by 2010 (adjusted down from an initially projected 61% due to the expected impact of server virtualization).
Consider this trend in light of the value proposition presented by virtualization technology: that you can use software to run multiple virtual machines (VMs) on the same single physical machine formerly employed as a dedicated server. Now multiple operating systems and their attendant applications must vie for the same discrete resources such as processor capacity, memory, storage I/O, and network I/O. The dormant issues of resource contention arise once again. Naturally, anyone considering a virtualized server environment must plan for sufficient hardware-based resource capacity to accommodate it. But adding additional capacity is not always feasible — flexibility in expanding network I/O, for instance, is something available only on relatively high-end machines.
Bear in mind why you decided to virtualize your server environment in the first place. Virtualization came about in large part as a solution to address the fact, widely reported in industry media, that dedicated single-application servers were typically running at a very low rate of utilization – 10% to 15% of capacity, according to one IDC analyst. Data centers were paying for hardware-based resources of which they used only a fraction. Virtualization lets you make more efficient and cost-effective use of those resources. Overprovisioning on the hardware side to accommodate virtualization would be a self-defeating proposition. It follows that, for the benefit of maximized resource usage to be fully realized, the applications running on VMs must also maximize efficiency. Data connectivity components are no exception. If the database connectivity components that you are using are not efficient in their use of CPU, memory, storage, and network I/O, your virtualization efforts will fail.
Data Connectivity Is Not All the Same
The differences among data connectivity components such as ODBC and JDBC drivers and ADO.NET data providers is typically poorly understood even by many data specialists. A key factor contributing to this lack of awareness is that the most widely used commercial databases all include data connectivity components at no additional charge; these are quite often simply used by default in connecting a particular database to various applications. The open source community, too, offers data connectivity software. However, insisting on the use of such “free” – but often substandard – data connectivity components can actually cost organizations more than they anticipate in terms of inadequate performance.
Within the context of a virtualized environment, if the data connectivity middleware is not designed for maximum streamlined and efficient functionality – if it employs client libraries, disk caching, and/or verbose database communications, for example – the overall consumption of the hardware resources could be considerable.
Figures 1-3 illustrate the results of tests conducted comparing resource usage for a single server using the standard data connectivity component included with the relational database from a major vendor against a commercially available third-party data connectivity component specifically designed for high performance. Figure 1 simply compares the raw throughput performance in terms of database rows read by an accessing application (rows per second compares a third-party data connectivity component vs a database vendor-provided data connectivity component).
As expected, the graph shows that using the high-performance data connectivity component yields a 25–50% edge in throughput performance. A look at the difference in how efficiently the two components use hardware-based resources to deliver their respective performance, however, is more interesting to anyone considering a virtualized server environment. The graph as shown in Figure 2 measures the database rows read by the same accessing application per each second of CPU usage (it compares a third-party data connectivity component versus a database vendor-provided data connectivity component).
As reflected in this graph, use of the third-party high-performance data connectivity component yields about twice the efficiency in CPU usage over the one included with the major relational database. The next graph, measuring the differences in total memory usage, is even more notable.
Figure 3 shows the third-party component using as little as one-sixth of the memory consumed with the vendor-provided component to deliver the superior throughput shown in the first graph; it compares a third-party data connectivity component versus a database vendor-provided data connectivity component.
This degree of difference, as demonstrated on a traditional non-virtualized server, becomes highly significant in a virtualized scenario involving resource contention with additional VMs on a single physical machine consisting of highly utilized hardware. In that case, potential hardware contention issues could sharply curtail the number of VMs you can feasibly run on that machine. Since leveraging your hardware resources to the max is the goal, it pays to understand the differences between data connectivity components.
The Impact of Architecture
The first thing to be aware of is the general architecture of the data connectivity component. Many data connectivity components are complicated and slowed by the use of database vendor client libraries. These comprise additional software that must be installed on the same server as the application. Client libraries are very general-purpose; they are designed to cover the widest possible range of connectivity scenarios. They introduce additional steps into the connectivity process and thus reduce performance and scalability.
Wire protocol data connectivity components are different, as shown in the diagrams in Figure 4 that illustrate this point for ODBC drivers. These use the same (officially supported) protocols used by the native database clients and thus communicate directly with the database at the network level, requiring no use of client libraries.
This streamlined architecture enhances performance and scalability via reduced complexity. The use of client libraries flies in the face of a primary reason for doing virtualization in the first place: reducing staff involvement through streamlined administration. As mentioned earlier, the libraries must be deployed on each server. For a machine running four virtual machines, these libraries must be installed, deployed, and configured four separate times! Wire protocol components demand no deployment of client libraries and considerably reduce the configuration process due to the lack of additional required components. This streamlined administration achieves one of the primary goals of virtualization: to reduce the cost of administration.
Knowledge Is Efficiency
While database neutrality is important in choosing a data connectivity supplier, so is a deep, detailed, and up-to-date knowledge of whatever database and database version you happen to be connecting to, as well as the inner workings of the operating system platforms and the latest versions of the connectivity standards in use. This requires that the middleware vendor maintain close partnerships with both database and operating system vendors, as well as active involvement with IT standards organizations.
Intelligent, current, and well-informed implementation of standards, programming languages, and protocols on the part of data connectivity components can contribute to resource usage efficiency in any number of ways. Let’s say, for example, that a database vendor protocol offers a choice in approaches for executing SQL: the first employing one round-trip over the network and the second employing two. The one employing two round-trips actually makes for a more straightforward match to the database connectivity API and is therefore simpler to implement. However, developers who strive for maximum efficiency and performance will make the extra effort to design database connectivity middleware that takes the single round-trip approach. Besides gaining performance, eliminating the additional round-trip reduces the database connectivity component’s usage of the network – which, you’ll recall, is one of the discrete resources for which multiple virtual OSes and applications must compete.
An intimate working knowledge of the latest technology and features of a target database also allows a data connectivity supplier to leverage any advantages favorable to virtualization. Some suppliers offer in their products a variety of external options that let users control efficiency and resource usage from a data connectivity standpoint. Better still are suppliers who incorporate both specialization in data access and its diverse uses with these external controls to provide wizards that guide less-knowledgeable staff in configuring the connectivity components for their particular application and environment.
Optimizing data connectivity middleware for your virtualized environment, however, doesn’t require that you familiarize yourself with the often arcane details of ever-evolving connectivity and systems integration standards as well as database and OS platforms. The intelligent approach to take with data access is to enlist a third-party supplier that has dedicated expertise in this area. That supplier will be able to assess your planned or existing virtualized environment and recommend a data connectivity middleware solution that will make the most efficient use of your server systems’ hardware-based resources, thereby optimizing your virtualization strategy.
To sum up the important factors to look for in choosing a data connectivity solution and/or vendor for your virtualized server environment:
The best route to meeting these criteria for data connectivity solutions in virtualized environments is to look for a connectivity middleware supplier that has data connectivity as its core specialty. You want an independent vendor that is thoroughly steeped in the very latest technological nuts and bolts of any database, operating system, applications platform, and connectivity standard they support. Ask about the vendor’s IT partnerships and involvement in relevant standards bodies.
Whatever your specific details and goals are in your approach to virtualization, don’t neglect the data connectivity layer of your system stack. It can make all the difference in getting the most out of your virtualized server environment.
Data Connectivity Can Spell Virtualization Success
To illustrate the difference that incorporating a high-performance data connectivity solution can make in a virtualized server environment, consider a company that provides critical business intelligence to customers such as government agencies, private security firms, and financial institutions.
The company requires vast stores of data – over 25TB stored in a major relational database. Its strict service level agreement (SLA) requirements involve heavy penalties for failure to provide requested data within specified timeframes. With a rapid increase in the number of customers as well as in the amount of data to process for those customers, the company finds itself fast running out of space in its data center. Expansion is not an option, and relocating would mean a major disruption to operations.
The IT department instead adopts a strategy of deploying blade servers and virtual machines to concentrate space used by the server farm. Halfway through the conversion it becomes clear that the maximum number of VMs each server can adequately accommodate while meeting the delivery strictures of the
Diagnosis identifies the largest performance and scalability bottleneck as occurring in the database layer. By replacing the JDBC drivers supplied by the database vendor with third-party high-performance JDBC drivers that support optimizations such as connection and statement pooling, the data center realizes a threefold gain in application response time. With the server farm now comfortably able to accommodate two additional VMs per server, the space-saving project can be successfully completed.
