Db2 Warehouse delivers 4x faster query performance than previously, while cutting storage costs by 34x

Db2 Warehouse delivers 4x faster query performance than previously, while cutting storage costs by 34x

Source Node: 2763932

Db2 Warehouse delivers 4x faster query performance than previously, while cutting storage costs by 34x <!—-> <!– –>



Close up of network data flowing on black background.

Data warehouses are a critical component of any organization’s technology ecosystem. They provide the backbone for a range of use cases such as business intelligence (BI) reporting, dashboarding, and machine-learning (ML)-based predictive analytics that enable faster decision making and insights. The next generation of IBM Db2 Warehouse brings a host of new capabilities that add cloud object storage support with advanced caching to deliver 4x faster query performance than previously, while cutting storage costs by 34x1.

Read the GA announcement

The introduction of native support for cloud object storage (based on Amazon S3) for Db2 column-organized tables, coupled with our advanced caching technology, helps customers significantly reduce their storage costs and improve performance compared to the current generation service. The adoption of cloud object storage as the data persistence layer also enables users to move to a consumption-based model for storage, providing for automatic and unlimited storage scaling.

This post highlights the new storage and caching capabilities, and the results we are seeing from our internal benchmarks, which quantify the price-performance improvements.

Cloud object storage support

The next generation of Db2 Warehouse introduces support for cloud object storage as a new storage medium within its storage hierarchy. It allows users to store Db2 column-organized tables in object storage in Db2’s highly optimized native page format, all while maintaining full SQL compatibility and capability. Users can leverage both the existing high performance cloud block storage alongside the new cloud object storage support with advanced multi-tier NVMe caching, enabling a simple path towards adoption of the object storage medium for existing databases. 

The following diagram provides a high-level overview of the Db2 Warehouse Gen3 storage architecture:

A high-level overview diagram of the Db2 Warehouse Gen3 storage architecture

As shown above, in addition to the traditional network-attached block storage, there is a new multi-tier storage architecture that consists to two levels:

  1. Cloud object storage based on Amazon S3 — Objects associated with each Db2 partition are stored in single pool of petabyte-scale, object storage provided by public cloud providers.
  2. Local NVMe cache — A new layer of local storage supported by high-performance NVMe disks that are directly attached to the compute node and provide significantly faster disk I/O performance than block or object storage.

In this new architecture, we have extended the existing buffer pool caching capabilities of Db2 Warehouse with a proprietary multi-tier cache. This cache extends the existing dynamic in-memory caching capabilities, with a compute local caching area supported by high-performance NVMe disks. This allows Db2 Warehouse to cache larger datasets within the combined cache thereby improving both individual query performance and overall workload throughput.

Performance benchmarks

In this section, we show results from our internal benchmarking of Db2 Warehouse Gen3. The results demonstrate that we were able to achieve roughly 4x1 faster query performance compared to the previous generation thanks to using cloud object storage optimized by the new multi-tier cloud storage layer instead of storing data on network-attached block storage. Additionally, moving the cloud storage from block to object storage results in a 34x reduction in cloud storage costs.

For these tests we set up two equivalent environments with 24 database partitions on two AWS EC2 nodes, each with 48 cores, 768 GB memory and a 25 Gbps network interface. In the case of the Db2 Warehouse Gen3 environment, this adds 4 NVMe drives per node for a total of 3.6 TB, with 60% allocated to the on-disk cache (180 GB per database partition, or 2.16TB total).

In the first set of tests, we ran our Big Data Insight (BDI) concurrent query workload on a 10TB database with 16 clients. The BDI workload is an IBM-defined workload that models a day in the life of a Business Intelligence application. The workload is based on a retail database with in-store, on-line, and catalog sales of merchandise. Three types of users are represented in the workload, running three types of queries:

  • Returns dashboard analysts generate queries that investigate the rates of return and impact on the business bottom line.
  • Sales report analysts generate sales reports to understand the profitability of the enterprise.
  • Deep-dive analysts (data scientists) run deep-dive analytics to answer questions identified by the returns dashboard and sales report analysts.
Chart showing the difference in query performance between Gen3 and the current generdation. Gen3 is 4x faster.

For this 16-client test, 1 client was performing deep dive analytic queries (5 complex queries), 5 clients were performing sales report queries (50 intermediate complexity queries) and 10 clients were performing dashboard queries (140 simple complexity queries). All runs were measured from cold start (i.e., no cache warmup, both for the in-memory buffer pool and the multi-tier NVMe cache). These runs show 4x faster query performance results for the end-to-end execution time of the mixed workload (213 minutes elapsed for the previous generation, and only 51 minutes for the new generation).

The significant difference in query performance is attributed to the efficiency gained through our multi-tier storage layer that intelligently clusters the data into large blocks designed to minimize the high-latency access to the cloud object storage. This enables a very fast warm up of the NVMe cache, enabling us to capitalize on the significant difference in performance between the NVMe disks and the network-attached block storage to deliver maximum performance. During these tests, both CPU and memory capacity were identical for both tests.

Bar chart illustrating that a warm cache is 4.5x faster for average query speed-up ratio, when compared to a cold cache.

In the second set of tests, we ran a single stream power test based on the 99 queries of the TPC-DS workload also at the 10 TB scale. In these results, the total speedup achieved with the Db2 Warehouse Gen3 was 1.75x when compared with the previous generation. Because a single query is executed at a time, the difference in performance is less significant. The network-attached block storage is able to maintain its best performance due to lower utilization when compared to concurrent workloads like BDI, and the warmup cost for our next generation tier cache is prolonged through single stream access. Even so, the new generation storage won handily. Once the NVMe cache is warm, a re-run of the 99 queries achieves a 4.5x average performance speedup per query compared to the previous generation.

Cloud storage cost savings

Bar chart showing it is 34x less expensive to host Db2 data on object vs block storage.

The use of tiered object storage in Db2 Warehouse Gen3 not only achieves these impressive 4x query performance improvements, but also reduces cloud storage costs by a factor of 34x, resulting in a significant improvement in the price performance ratio when compared to the previous generation using network-attached block storage.

Summary

Db2 Warehouse Gen3 delivers an enhanced approach to cloud data warehousing, especially for always-on, mission-critical analytics workloads. The results shared in this post show that our advanced multi-tier caching technology together with the automatic and unlimited scaling of object storage not only led to significant query performance improvements (4x faster), but also massive cloud storage cost savings (34x cheaper). If you are looking for a highly reliable, high-performance cloud data warehouse with industry leading price performance, try Db2 Warehouse for free today.

Try Db2 Warehouse for free today


1. Running IBM Big Data Insights concurrent query benchmark on two equivalent Db2 Warehouse environments with 24 database partitions on two EC2 nodes, each with 48 cores, 768 GB memory and a 25 Gbps network interface; one environment did not use the caching capability and was used as a baseline. Result: A 4x increase in query speed using the new capability. Storage cost reduction derived from price for cloud object storage, which is priced 34x cheaper than SSD-based block storage.

Related categories

More from Analytics

Data integrity vs. data quality: Is there a difference?

6 min readIn short, yes. When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. Data quality uses those criteria to measure the level of data integrity and, in turn, its reliability and applicability for its intended use. Data quality and integrity are vital to a data-driven organization that employs analytics for business decisions, offers self-service data access for internal stakeholders…

6 min read

Enterprises need generative AI tailored to their unique needs, with their own unique data

3 min readIn less than a year, we’ve gone from the “run your business and apply AI to help” paradigm to a reality where enterprises in every industry are navigating how to embed AI into the fabric of their strategies. Generative AI based on foundation models has brought us to this inflection point. In fact, new research from IBM’s Institute for Business Value CEO study found three out of four (75%) CEOs surveyed believe the organization with the most advanced generative AI wins, and…

3 min read

IBM announces availability of the high-performance, cloud-native Netezza Performance Server as a Service on AWS

4 min readThe AI revolution is here, but so are the multitude of data challenges that organizations now face to effectively make AI work for them. When it comes to scaling new workloads, traditional cloud data warehouses have left customers with over-provisioning, vendor lock-in, and are limited in their ability to optimize both high performance analytics and AI workloads. Businesses today have a choice: either they self-disrupt or get disrupted by newer and more agile business models. These models have successfully operationalized…

4 min read

Introducing the next generation of Db2 Warehouse: Our cost-effective, cloud-native data warehouse built for always-on, mission-critical workloads

4 min readWe’re thrilled to introduce the general availability of our next generation, cloud-native Db2 Warehouse on Amazon Web Services (AWS). This is a game-changer for organizations looking to optimize costs, improve performance, share data responsibly, and unlock the full potential of their data for analytics and AI. With the next generation of Db2 Warehouse, we’re introducing a host of new capabilities. What’s new? 1. Cut your analytics costs by up to 34×1 The next generation of Db2 Warehouse introduces cloud-native support for…

4 min read

Time Stamp:

More from IBM