Build 2026 ยท SIGMOD Best Industry Paper ยท Early Access July 2026

GPU Accelerated Fabric Data Warehouse: CoddSpeed Complete Guide

Microsoft announced the first fully managed GPU-accelerated cloud data warehouse at Build 2026 on June 2, 2026. The engine is called CoddSpeed. It started as a Microsoft Research prototype running SQL on PyTorch tensors, evolved into a production system, and just won the SIGMOD 2026 Best Industry Paper. No SQL rewrites. No config changes. Up to 7ร— faster at 64-user concurrency. Early access opens July 2026.

7ร— Faster at 64-user concurrency vs 3 cloud DW providers ยท May 2026
5ร— UNC Health production improvement Existing workloads ยท verified customer
SIGMOD Best Industry Paper 2026 CoddSpeed paper ยท aka.ms/coddspeed
Jul 2026 Early access preview opens No code changes required
What is GPU accelerated Fabric Data Warehouse?

The GPU accelerated Fabric Data Warehouse uses an engine called CoddSpeed to offload supported SQL query operations โ€” large aggregations, complex joins, massive dataset scans โ€” from CPUs to NVIDIA GPUs. No SQL rewrites. Your data stays in place. The engine identifies GPU-eligible operations automatically and routes them transparently. Internal benchmarks (May 2026) show up to 7ร— faster performance at 64-user concurrency vs three unnamed comparable cloud warehouses. Early access preview opens July 2026. (per Microsoft Fabric Community Blog, June 2, 2026)

๐Ÿ“… Announced: June 2, 2026 ยท Microsoft Build โฑ Read time: ~12 min โœ๏ธ A.J., Data Engineering Researcher ๐Ÿ”— Source: Microsoft Fabric Blog

Why GPU Acceleration Changes the Economics of Data Warehousing

Data warehouses have always run on CPUs. That was fine when data volumes were smaller, concurrency was lower, and the primary consumer of warehouse queries was a scheduled report running overnight. Three things changed that picture.

First, CPU performance gains slowed. Moore’s Law wound down. x86 software optimizations saturated. As data sizes kept growing, cost-per-query started bending the wrong way โ€” more data, same hardware, higher bills. Per the CoddSpeed paper authors at Microsoft: “CPU gains were slowing โ€” Moore’s law was winding down, x86 software optimizations were saturating, and as data sizes kept growing, cost-per-query was bending the wrong way.”

Second, AI flooded data centers with GPU compute. NVIDIA accelerated computing, custom ASICs, NVLink, InfiniBand, CXL โ€” orders of magnitude faster than traditional servers at compute, memory bandwidth, and networking. Data centers that had invested in GPU infrastructure for AI inference had this hardware sitting underutilized between inference jobs.

Third, agents changed the query pattern. A human opens a dashboard and looks at it once. An AI agent issues the same complex analytical query dozens of times per minute โ€” each query in the critical path of a real-time application response. Traditional CPU-based warehouse performance that was acceptable for scheduled reporting becomes a production bottleneck for agentic workloads.

โšก

The Core Problem CoddSpeed Solves

Agents, applications, and AI systems are now querying data warehouses continuously โ€” not just at scheduled report times. Every query sits in the critical path of a user experience or agent response. CPU-based warehouses were not designed for this pattern. GPU parallelism handles mixed workloads โ€” many concurrent analytical queries โ€” more efficiently than CPU thread pools, which is why the performance gap grows with concurrency.

CoddSpeed is Microsoft’s answer to all three: take the GPU hardware already in Azure data centers for AI workloads, build a thin abstraction layer that lets the SQL query optimizer route eligible operations to GPUs instead of CPUs, and make the entire thing transparent to the SQL developer writing the queries.

CoddSpeed Architecture โ€” Two Thin Abstraction Layers

CoddSpeed’s architecture is intentionally minimal. The design philosophy, per the paper’s lead author Matteo Interlandi (Principal Scientist Manager, Azure Data GSL): add the minimum number of abstraction layers needed to get query fragments onto GPU hardware without rebuilding the query optimizer or the storage engine.

The result is two layers: CAL and DAL. Both are minimalist by design โ€” each does exactly one job and delegates everything else to the existing Fabric infrastructure.

CAL

Coprocessor Abstraction Layer

A hardware-agnostic API for offloading query fragments (sub-plans). The Fabric Data Warehouse optimizer serializes eligible query fragments as Substrate plans, hands them to a coprocessor Runtime, feeds data through it (Parquet, SQL Server columnar โ€” zero-copy where possible), and collects results in columnar format.

Coprocessors expose capabilities so the optimizer knows what each one can run. Fragments too large for GPU High Bandwidth Memory use a partitionable execution model with per-partition CPU fallback. CAL does not optimize plans โ€” that stays in Fabric DW’s Cascades optimizer.

DAL

Data Abstraction Layer

A unified caching and shuffle service that hides the transport layer behind a single key/value API โ€” NVLink, Infinity Fabric, InfiniBand, PCIe, Ethernet are all abstracted away. Applications call one API regardless of what hardware transport is moving data between GPU and CPU.

DAL does not decide what to cache long-term โ€” that stays with the host scheduler. It simply ensures data movement between processing units is as fast as possible regardless of the specific hardware configuration in the Azure data center.

๐Ÿ“Œ

Why Minimalism Is the Right Design Choice

GPU hardware generations change every 18โ€“24 months. An abstraction layer that exposes too much hardware-specific detail would require rewriting application code with each new NVIDIA architecture. CAL and DAL abstract away the hardware details so CoddSpeed can adopt newer GPUs, FPGAs, or ASICs without changing the SQL engine or the application layer. Per the paper: this engine is “designed to outlive any single chip generation.”

What Operations Get GPU-Offloaded

Not every SQL operation benefits from GPU acceleration. CoddSpeed focuses on the operations that are both compute-intensive (worth the overhead of routing to GPU) and parallelizable (benefit from GPU’s massively parallel architecture):

  • Large aggregations: SUM, COUNT, AVG, MIN, MAX across hundreds of millions to billions of rows โ€” GPU parallelism dramatically reduces scan time
  • Complex joins: Multi-table analytical joins where the join cardinality is high and the operation is CPU-bound in traditional execution
  • Massive dataset scans: Full or near-full table scans on large fact tables โ€” GPU memory bandwidth advantages apply here
  • Reporting and application workloads: The benchmark covered “common reporting, application, and AI-driven analytics scenarios” per the official announcement

Operations that don’t fit GPU High Bandwidth Memory use per-partition CPU fallback automatically โ€” the query still executes correctly, just on CPU for those fragments.

Research Origins โ€” From TQP to CoddSpeed

CoddSpeed did not emerge fully formed at Build 2026. It is the production version of a multi-year research project that started with a question that sounded ridiculous at the time: “Could we even run SQL on AI compute runtimes?”

TQP

Tensor Query Processor (Research Prototype)

The original research prototype. Expressed relational operators โ€” SELECT, JOIN, GROUP BY, ORDER BY โ€” as PyTorch tensor operations. The insight: PyTorch already has highly optimized, GPU-accelerated implementations of the mathematical operations underlying relational algebra. Why not use them directly for SQL?

TQP proved the concept worked but wasn’t production-ready โ€” it was tightly coupled to PyTorch and specific GPU architectures, making it fragile for a production data warehouse that needed to run on diverse hardware across Azure’s global infrastructure.

CoddSpeed

Production Engine (CoddSpeed)

The hardened, optimized version of TQP. Named after Edgar F. Codd โ€” the computer scientist who invented the relational model in 1970. Replaced PyTorch coupling with the CAL/DAL abstraction layers, making it hardware-agnostic and able to run on NVIDIA GPUs, future FPGAs, and custom ASICs without application-layer changes.

Won the SIGMOD 2026 Best Industry Paper. SIGMOD (Special Interest Group on Management of Data) is the flagship peer-reviewed venue for database research โ€” the highest academic recognition for production database systems work.

The name “CoddSpeed” is a deliberate nod to Edgar F. Codd โ€” the IBM researcher who published “A Relational Model of Data for Large Shared Data Banks” in 1970, founding the relational database field. The choice of name signals that Microsoft sees this as a generational shift in query processing โ€” the first new execution paradigm since the relational model itself became the foundation for analytical databases.

Benchmark Numbers โ€” What the Data Actually Says

Microsoft published internal benchmark figures in the Build 2026 announcement. Before reading them, understand what they are and what they are not.

What the Benchmarks Cover

Internal testing conducted in May 2026, covering “common reporting, application, and AI-driven analytics scenarios.” The test measured query performance at different concurrency levels against three unnamed comparable cloud data warehouses. The specific benchmark suite (whether TPC-H, TPC-DS, or a proprietary Microsoft benchmark) is not disclosed in the public announcement. These are vendor-published figures with standard vendor benchmark caveats.

Concurrency LevelGPU-Fabric PerformanceWhat This Means
Single user (1 concurrent)~3ร— fasterRaw throughput advantage at low concurrency. Useful for individual developer or analyst workloads.
16 concurrent users~6ร— fasterMid-scale dashboard load. Teams of analysts hitting a Power BI report simultaneously.
64 concurrent users~7ร— fasterEnterprise-scale concurrency. Multiple departments, scheduled reports, and interactive queries simultaneously. The gap grows because GPU parallelism handles mixed workload pressure better than CPU thread pools.

Verified Customer Result

UNC Health (a US healthcare organization) is cited as an early customer reporting up to 5ร— improvement in query speeds on their existing workloads. This is a real production result on existing data and queries โ€” not a synthetic benchmark. It is the most credible data point in the announcement for evaluating whether the benchmark numbers reflect real-world outcomes.

โš ๏ธ

Read Vendor Benchmarks Carefully

The three comparison providers are not named. The benchmark suite is not disclosed. “Common reporting, application, and AI-driven analytics scenarios” is broad enough to include workload selection that favors the test subject. The concurrency scaling pattern (3ร— at 1 user, 7ร— at 64 users) is plausible โ€” GPU parallelism genuinely scales better under concurrent mixed workloads โ€” but independent validation is not yet available. Run your own workloads in the July 2026 early access preview before making architecture decisions based on these numbers.

Why the Concurrency Gap Makes Sense

The fact that the performance gap grows with concurrency (3ร— at 1 user, 7ร— at 64 users) is the most technically credible signal in the benchmark. CPU thread pools compete for shared cache and memory bandwidth under concurrent mixed workloads โ€” as more users hit the warehouse simultaneously, CPU architectures experience contention. GPU architectures with High Bandwidth Memory and NVLink interconnects handle parallelism differently โ€” the same hardware that runs one query efficiently can also run 64 queries efficiently because the parallelism model scales horizontally rather than vertically.

How It Works in Practice โ€” What Changes for You

The short answer: nothing changes for you. That is the design intention and the most important practical fact about CoddSpeed for Fabric users.

  • Your SQL stays unchanged. No new syntax. No query hints. No GPU-specific functions. The T-SQL you write today in Fabric Data Warehouse runs unchanged with GPU acceleration.
  • Your data stays in place. CoddSpeed reads data from the same Delta-Parquet files in OneLake that the CPU engine reads. No data migration. No separate GPU data store.
  • The optimizer decides what gets GPU-offloaded. The Fabric Data Warehouse Cascades optimizer identifies which query fragments are eligible for GPU execution based on the coprocessor capabilities exposed via CAL. You do not control this routing manually.
  • Non-eligible operations stay on CPU. Operations that don’t benefit from GPU acceleration, or fragments too large for GPU High Bandwidth Memory, execute on CPU with per-partition fallback. The query still completes correctly.
  • Enable via workspace toggle. In the early access preview (July 2026), GPU acceleration will be enabled through a workspace-level setting โ€” no infrastructure provisioning required.
โœ…

What This Means for Existing Fabric Warehouse Investments

If you have already built T-SQL queries, star schemas, stored procedures, and Power BI reports on Fabric Data Warehouse โ€” you get GPU acceleration without touching any of them. The investment you have already made in Fabric Warehouse query optimization, partition strategy, and Direct Lake semantic models carries forward completely. CoddSpeed accelerates what you already have.

Hardware Support โ€” Designed for Multiple Accelerator Types

CoddSpeed’s CAL/DAL abstraction was built with future hardware in mind. The architecture supports: NVIDIA GPUs (the initial implementation), FPGAs, custom ASICs, NVLink interconnects, InfiniBand, CXL, and PCIe. Microsoft’s design intent is that as new accelerator hardware enters Azure data centers โ€” whether from NVIDIA, AMD, or custom silicon โ€” CoddSpeed can adopt it without application-layer changes. The GPU implementation is the first, not the only, coprocessor target.

Who Benefits Most from GPU Accelerated Fabric Data Warehouse

Workload TypeExpected BenefitWhy
High-concurrency dashboards (16+ simultaneous users)High โ€” up to 6โ€“7ร—Concurrency is where GPU architecture advantage compounds. Enterprise Power BI deployments with many simultaneous report viewers are the primary beneficiary.
Agentic AI queries (continuous analytical queries from agents)HighAgents don’t sleep. Continuous multi-step analytical queries from AI agents that were previously bottlenecked by CPU concurrency limits run faster at scale.
Large aggregations on fact tables (hundreds of millions+ rows)HighGPU memory bandwidth and parallel compute cores handle large scan+aggregate operations significantly faster than CPU.
Complex multi-table joins in analytical queriesHighGPU parallelism reduces join execution time when cardinalities are large.
Scheduled batch reports (single-user, off-peak)Moderate โ€” ~3ร—Single-user workloads still benefit but the advantage is smaller. If reports already run in acceptable time windows, the improvement may be less critical.
Small lookup queries (point queries, filtered by primary key)LowSmall queries don’t generate enough compute work to justify GPU routing overhead. These stay on CPU.
DML operations (INSERT, UPDATE, DELETE)MinimalWrite operations are I/O-bound and coordination-dependent โ€” not the compute-intensive pattern that GPU acceleration targets.

The clearest use case is any organization where Power BI reports serve large numbers of simultaneous viewers โ€” finance teams, operations centers, executive dashboards โ€” where query latency has been a pain point during peak usage hours. GPU acceleration directly addresses this pattern.

The second clear use case is Fabric deployments being extended to serve AI agent workloads โ€” where agents make continuous analytical calls to the warehouse as part of multi-step reasoning chains. This is the pattern Microsoft specifically highlighted in the Build 2026 announcements around agentic data apps.

GPU Accelerated Fabric Warehouse vs Snowflake vs Databricks โ€” The Real Competitive Picture

CoddSpeed positions Microsoft Fabric Data Warehouse as the first fully managed cloud data warehouse with native GPU query acceleration. That is a meaningful claim โ€” but it needs context.

PlatformQuery Acceleration ApproachGPU-Native?Requires Config?
Microsoft Fabric DW (CoddSpeed)GPU offloading via CAL/DAL โ€” transparent to SQL developerYes โ€” NVIDIA GPU-nativeWorkspace toggle only
SnowflakeCPU-based vectorized execution with query optimizationNo GPU-native query executionN/A
Databricks (Photon)CPU-vectorized engine โ€” highly optimized native code for SQL and SparkCPU-vectorized, not GPUEnabled by default on supported clusters
Google BigQueryDremel distributed engine with columnar optimizationNo GPU-native SQL executionN/A
Amazon RedshiftCPU-based with AQUA (Advanced Query Accelerator) for some operationsPartial โ€” AQUA uses FPGAs, not GPUAQUA is managed โ€” not user-controlled

Databricks Photon โ€” The Closest Competitor

Databricks Photon is the most relevant comparison. It is a highly effective CPU-vectorized execution engine that generates native code for SQL and Spark operations โ€” claiming 2โ€“8ร— speedup over standard execution. Photon is excellent for mixed AI/SQL workloads, particularly on Delta Lake, which aligns well with Databricks’ use cases.

The key difference: Photon is CPU-vectorized. CoddSpeed is GPU-accelerated. At high concurrency โ€” 64+ simultaneous analytical users or continuous agent queries โ€” GPU architecture’s parallelism model scales differently than CPU vectorization. Whether this advantage holds in independent benchmarks comparing CoddSpeed vs Photon at enterprise concurrency is the question to watch when third-party testing emerges.

Snowflake โ€” The Incumbent Gap

Snowflake has no equivalent GPU-native query execution capability as of June 2026. Snowflake’s architecture uses CPU-based virtual warehouses with Snowflake’s own columnar storage format. For teams currently evaluating Fabric vs Snowflake, GPU acceleration is a meaningful new factor โ€” particularly for high-concurrency analytical workloads where the benchmark advantage compounds most.

The competitive moat from CoddSpeed is real but time-limited. Snowflake and Databricks have the engineering capacity to build GPU query execution. The question is how long it takes โ€” GPU query execution is architecturally complex, and TQP/CoddSpeed represents years of research investment. Microsoft’s head start is measured in years, not months. For current Fabric customers, the practical question is: does GPU acceleration justify staying on Fabric vs migrating to a competitor? For most, the answer is yes โ€” especially given that it requires no code changes.

How to Get Early Access โ€” July 2026 Preview

  1. Watch for the early access sign-up Early access preview opens July 2026. Microsoft will publish sign-up instructions on the Fabric Updates Blog and the Microsoft Fabric What’s New page. Subscribe to both to catch the announcement the day it goes live.
  2. Register your Fabric workspace for early access Per the announcement, GPU acceleration will be enabled through a workspace-level toggle โ€” no infrastructure provisioning, no separate resource allocation. You enable it in workspace settings and existing queries start routing eligible operations to GPU automatically.
  3. Benchmark your own workloads โ€” not Microsoft’s Microsoft’s internal benchmarks covered “common reporting, application, and AI-driven analytics scenarios” against unnamed providers. Your workload is specific. Run your actual high-concurrency queries โ€” the ones that are currently slow or expensive โ€” and measure before and after. High-aggregation, high-concurrency patterns will show the largest improvements. Small lookup queries will show minimal change.
  4. Focus testing on concurrency scenarios The benchmark advantage compounds with concurrency. Single-user testing (3ร— improvement) won’t capture the full impact. Test with realistic concurrent user counts โ€” simulate your peak dashboard load of 10, 20, 50 simultaneous users to see where GPU acceleration has the most business impact.
  5. Check capacity requirements GPU-accelerated compute is premium infrastructure. Pricing for the GPU-accelerated tier has not been announced as of June 2026. Evaluate cost vs performance improvement when pricing is published โ€” a 7ร— faster query that costs 4ร— more per query-second may or may not improve your total cost of ownership depending on workload patterns.
๐Ÿ“Œ

Pricing โ€” Not Yet Announced

As of June 12, 2026, Microsoft has not published pricing for GPU-accelerated Fabric Data Warehouse. It will likely be a premium tier above standard F-SKU capacity pricing. Total cost of ownership comparisons against Snowflake and Databricks are not possible until pricing is published. Factor this into your architecture evaluation timeline โ€” performance comparisons without cost comparisons are incomplete.

Frequently Asked Questions

What is the GPU-accelerated Fabric Data Warehouse?
The GPU-accelerated Fabric Data Warehouse uses an engine called CoddSpeed to offload supported SQL query operations from CPUs to NVIDIA GPUs โ€” transparently, without SQL rewrites or data migration. It was announced at Microsoft Build 2026 on June 2, 2026, and its research paper won the SIGMOD 2026 Best Industry Paper Award. Internal benchmarks show up to 7ร— faster performance at 64-user concurrency. Early access preview opens July 2026.
What is CoddSpeed?
CoddSpeed is the GPU-accelerated query execution engine inside Fabric Data Warehouse. It originated as TQP (Tensor Query Processor) โ€” a Microsoft Research prototype that expressed relational operators as PyTorch tensor operations. The production version uses two minimalist abstraction layers: CAL (Coprocessor Abstraction Layer) for routing query fragments to GPU hardware, and DAL (Data Abstraction Layer) for unified caching and data movement. It is named after Edgar F. Codd, who invented the relational model in 1970.
Do I need to rewrite my SQL for GPU acceleration?
No. GPU acceleration in Fabric Data Warehouse is completely transparent. Your SQL queries run unchanged. Your data stays in OneLake. The Cascades optimizer automatically identifies eligible query fragments and routes them to GPU. Operations that don’t fit GPU High Bandwidth Memory use per-partition CPU fallback. Enabling GPU acceleration requires only a workspace-level toggle in the early access preview.
How much faster is GPU accelerated Fabric Data Warehouse?
Per Microsoft’s internal benchmarks (May 2026) against three unnamed comparable cloud warehouses: approximately 3ร— faster at single-user concurrency, ~6ร— faster at 16 concurrent users, and ~7ร— faster at 64 concurrent users. UNC Health reported 5ร— improvement on existing production workloads. These are vendor-published internal benchmarks โ€” independent third-party validation is not yet available. Run your own workload benchmarks during the July 2026 early access preview.
How does GPU-accelerated Fabric compare to Snowflake and Databricks?
Microsoft positions CoddSpeed as the first fully managed cloud data warehouse with native GPU query acceleration. Snowflake uses CPU-based query execution with no GPU-native equivalent. Databricks Photon is a CPU-vectorized engine (2โ€“8ร— speedup claims) โ€” highly effective but not GPU-accelerated. If Microsoft’s benchmarks hold in independent testing, this is a meaningful architectural differentiation at high concurrency workloads. Independent third-party comparisons are not yet available as of June 2026.
When is GPU accelerated Fabric Data Warehouse available?
Early access preview opens July 2026. The capability was announced at Microsoft Build 2026 on June 2, 2026. General availability timing has not been announced. Pricing has not been announced. Watch the Microsoft Fabric Community Blog and the What’s New page for the July 2026 early access sign-up.

โš ๏ธ Accuracy Disclaimer

All benchmark figures (3ร—, 6ร—, 7ร— performance improvements, UNC Health 5ร— result) are sourced from Microsoft’s internal benchmarks as published in the official Microsoft Fabric Community Blog announcement (June 3, 2026) and the CoddSpeed research post (June 2, 2026). The three comparison providers are not named by Microsoft. No independent third-party benchmark validation is available as of June 2026. GPU acceleration enters early access preview July 2026 โ€” pricing not announced. Verify current availability and pricing at the Microsoft Fabric What’s New page before architecture decisions. UIG Data Lab is an independent publication, not affiliated with or endorsed by Microsoft Corporation or NVIDIA.

AJ
A.J. Data Engineering Researcher & Technical Writer ยท UIG Data Lab All articles โ†’

A.J. researches and writes about data engineering, analytics architecture, Microsoft Fabric, and modern cloud data platforms. Coverage spans Microsoft Fabric, Power BI, Azure Data Engineering, Databricks, Snowflake, Apache Spark, dbt, Apache Airflow, and modern cloud data infrastructure. The focus is practitioner-level content that helps data professionals understand platform capabilities, evaluate technology decisions, optimize costs, and implement practical solutions using official documentation, product updates, community insights, and industry best practices. His writing covers real decisions from real deployments โ€” not documentation rewrites.

CoddSpeed GPU Data Warehouse Microsoft Fabric NVIDIA SIGMOD 2026 Build 2026 Data Engineering Query Acceleration Snowflake vs Fabric Databricks vs Fabric
Scroll to Top