Blockchain

Accelerating query performance with watsonx.data Presto C++ and Intel Sapphire Rapid Processor on AWS

The long-standing partnership between IBM and Intel has led to significant advancements in database performance over the past 25 years. Based on internal testing by IBM, the latest generation Intel® Xeon® Scalable processors from Intel, combined with Intel software, have the potential to drive enhanced performance for IBM® watsonx.data.

IBM watsonx.data is a hybrid, governed data lake house optimized for data, analytics and AI workloads. Key highlights include driving business analytics with engines like Presto and Spark. Additionally, watsonx.data provides a flexible approach and a unified view of your data across hybrid cloud environments.

In June, IBM released Presto C++, the next generation of Presto, developed by open-source community members at Meta, IBM, Uber and others. This query engine is developed in collaboration with Intel using the Velox, an open-source C++ native acceleration library designed to be composable across multiple compute engines. IBM also paired the release of the Presto C++ engine with a query optimizer based on decades of experience to further accelerate query performance with optimized query rewrite.

Amazon Elastic Compute Cloud (EC2) R7iz instances are memory-optimized, high CPU performance instances. They are the fastest 4th Generation Intel Xeon Scalable based (Sapphire Rapids) instances in the cloud, with 3.9 GHz sustained all-core turbo frequency.1 R7iz instances can deliver up to 20% better performance than previous generation z1d instances and reduce total cost of ownership (TCO).  They include built-in accelerators like Intel® Advanced Matrix Extensions (Intel® AMX) that offer a much-needed alternative in the market for customers with growing AI workload demand.

The combination of high CPU performance and high memory footprint makes R7iz instances suited for front-end Electronic Design Automation (EDA), relational database workloads with high per-core licensing fees, and financial, actuarial and data analytics simulation workloads.

Intel and IBM have worked closely to bring the open-source software optimizations to Presto, Presto C++ and watsonx.data. Combined with the hardware improvements, Intel 4th Gen Xeon has delivered favorable results on watsonx.data.

IBM watsonx.data with Presto C++ v0.286 and query optimizer on AWS ROSA, running on Intel processors (4th generation) tested internally by IBM, was able to deliver better price performance than Databrick’s Photon engine, with better query runtime at similar cost, derived from public 100TB TPC-DS Query benchmarks (see note below).

Try IBM watsonx.data to experience the future of data


* Note: This claim is based on IBM internal testing of Presto C++ v0.286 on a AWS r7iz.4xlarge EC2 instance with 4th Generation Intel Xeon Scalable-based processor (Sapphire Rapids) with 1 master + 84 worker nodes, 1260 vCPUs, 10.08 TB Memory, Up to 12.5 Gbps Network against public Databricks 100TB TCP-DS Query benchmarks published in 2021 with 1 master + 256 worker nodes, 2112 vCPUs, 16.1 TB Memory, 528.2 TB of total storage, and 10GB Network. Pricing calculations are based on IBM watsonx.data pricing as of 5/7/24 and Databricks published pricing for Photon as of 5/7/24. Results are based on testing conditions and pricing as of the dates shown. Actual costs and performance will vary depending on individual client configurations and conditions. Results are derived from the 100TB TPC-DS Query benchmark and as such is not comparable to published Databricks SQL 8.3 benchmark results, as results do not comply with the 100TB TPC-DS Query benchmark specification.

Was this article helpful?

YesNo


Source link

Related Articles

Back to top button