Architecture Decision · Microsoft Fabric

Fabric Lakehouse vs Data Warehouse Best Practices — 2026 Guide

The official Microsoft decision framework, the June 2026 updates that change the calculus — GPU-accelerated Warehouse, Materialized Lake Views GA, ALTER COLUMN, Time Travel at SQL analytics endpoint, Datawarehouse Monitor — and the production patterns that enterprise teams actually use.

Quick Answer

Both Lakehouse and Data Warehouse store Delta tables on OneLake — the storage choice is not the decision. Microsoft’s official framework reduces to three questions: Do you prefer Spark or T-SQL? Do you need multi-table ACID transactions? Is your data structured only, or mixed? As of June 2026, the Warehouse gained GPU acceleration (CoddSpeed, SIGMOD 2026 Best Industry Paper) delivering up to 7× faster performance at high concurrency — early access preview July 2026. The Lakehouse gained Materialized Lake Views GA with cross-workspace extended lineage, closing the Gold-layer automation gap that previously pushed teams toward Warehouse by default.

📅 Last verified: June 2026 ⏱ ~13 min read ✍️ A.J., Data Engineering Researcher & Technical Writer · UIG Data Lab 🔗 Source: Microsoft Learn

Fabric Lakehouse vs Data Warehouse: The Decision Framework

Microsoft publishes a formal decision guide at learn.microsoft.com/fabric/fundamentals/decision-guide-lakehouse-warehouse. It reduces to a three-question tree. The framework applies regardless of data volume — it is about skill set, transaction requirements, and data shape, not scale.

Decision CriterionAnswer → Use LakehouseAnswer → Use Warehouse
Development model
How does your team prefer to work?
Spark, PySpark, Scala, Python notebooksT-SQL, SQL Server tooling (SSMS, Azure Data Studio)
Multi-table transactions
Do you need full ACID across multiple tables?
No — single-table ACID via Delta is sufficientYes — you need INSERT/UPDATE/DELETE/MERGE across multiple tables atomically
Data complexity
What formats are you analysing?
Mixed: unstructured, semi-structured (JSON, logs, images), or don’t know yetStructured only: relational data with stable schemas
Source: Microsoft Fabric decision guide — choose between Warehouse and Lakehouse
ℹ️
You Can Add Either One Later

Microsoft’s official guidance explicitly states that you can always add one or the other at a later point should your business needs change. Starting with Lakehouse does not lock you out of Warehouse. Both use the same Polaris SQL engine for T-SQL queries. Regardless of where you start, compute is fully compatible and data stays in the same OneLake location.

Field note — A.J., Data Engineering Researcher

The most common mistake in this decision is choosing based on team size or data volume rather than the three criteria above. A team of two data engineers can be building a Warehouse-first architecture. A team of twenty can be entirely Lakehouse-first. The decision is about what tools your team knows and what transactional guarantees your Gold layer needs — not how big you are. Defaulting to Warehouse because it “sounds more enterprise” and defaulting to Lakehouse because it “sounds more modern” are both wrong reasons to choose.

OneLake — Why the Storage Is No Longer the Decision

In every data platform that predates Fabric, choosing a storage engine was the primary architectural decision: Parquet on ADLS vs. Synapse Dedicated Pool vs. Databricks Delta vs. Snowflake internal storage. Each choice created a silo. Moving data between them required ETL pipelines, refresh schedules, and duplicate storage costs.

OneLake changes this completely. Both Lakehouse and Data Warehouse store their data as Delta Parquet files in OneLake. There is one physical location. The Lakehouse Spark engine, the Lakehouse SQL analytics endpoint, and the Warehouse T-SQL engine all read the same files. Power BI Direct Lake reads the same files. No copies. No synchronisation.

The decision between Lakehouse and Warehouse is therefore a decision about which compute surface and access model fits your workload — not which storage system to commit to. You can switch, combine, or add later without moving data.

CapabilityLakehouseData Warehouse
Storage formatDelta Parquet in OneLakeDelta Parquet in OneLake
Primary computeApache Spark (Runtime 1.3 — Spark 3.5)Polaris distributed T-SQL engine
SQL accessRead-only SQL analytics endpoint (T-SQL)Full read/write T-SQL with DML
DML (INSERT/UPDATE/DELETE)Via Spark only — not in SQL analytics endpointFull T-SQL DML — multi-table transactions
Power BI Direct LakeSupported — SQL analytics endpoint must be refreshedSupported — Refresh SQL Endpoint activity (GA June 2026)
Shortcuts (external data)Full support — ADLS, S3, GCS, other LakehousesRead via cross-database queries to Lakehouse
Materialized Lake ViewsGA March 2026 — SQL and PySpark, cross-workspace lineage (June 2026)Not available
Time TravelVia Spark or SQL analytics endpoint (Preview, June 2026)T-SQL — FOR TIMESTAMP AS OF — configurable retention (default 30 days)

Fabric Lakehouse — What It Is and When to Use It

A Fabric Lakehouse is a data architecture platform built on Delta tables in OneLake. It has two primary access surfaces: the Spark engine (via notebooks and Spark Job Definitions) for code-first data engineering and data science, and the SQL analytics endpoint (a read-only T-SQL interface) for SQL-based consumption without Spark.

When you create a Lakehouse, Fabric automatically creates the SQL analytics endpoint and a default Power BI semantic model alongside it. SQL analysts and Power BI users get immediate access to the same data the Spark engineer just wrote — with no separate load step.

Lakehouse Is the Right Choice When

  • Your team writes PySpark, Scala, or Python notebooks daily
  • Data includes JSON, logs, images, or other semi-structured/unstructured formats
  • Schema evolves rapidly and schema-on-read flexibility is needed
  • Workload involves ML feature engineering, model training, or data science experiments
  • Bronze and Silver medallion layers with fast-changing raw data
  • You want Materialized Lake Views for automated incremental Gold-layer materialisation
⚠️

Lakehouse Limitations to Know

  • SQL analytics endpoint is read-only — no DML via T-SQL
  • Multi-table T-SQL transactions are not supported — use Spark Delta MERGE for upserts
  • ALTER COLUMN is not available at the SQL analytics endpoint
  • Time Travel via T-SQL at SQL analytics endpoint is Preview (June 2026) — limited by VACUUM retention settings per table
  • OneSecurity (OneLake data access roles) is incompatible with Direct Lake semantic models on the same Lakehouse (known issue, June 2026)

The Lakehouse SQL analytics endpoint gets a Refresh SQL Endpoint Activity (GA, June 2026) as a native pipeline step — refresh the endpoint on demand inside the same pipeline as your ETL load, so downstream BI queries always read the latest state without a separate scheduling workaround. See the full deep-dive in the Fabric Lakehouse Tutorial.

Fabric Data Warehouse — What It Is and When to Use It

A Fabric Data Warehouse is a lake-centric, distributed T-SQL engine built on Delta tables in OneLake. It provides the full SQL Server-compatible T-SQL surface: DDL, DML (INSERT, UPDATE, DELETE, MERGE), stored procedures, views, functions, schemas, and cross-database querying via three-part names. Teams migrating from SQL Server, Azure Synapse Dedicated Pools, or Teradata will recognise the development model immediately.

Warehouse Is the Right Choice When

  • Your team works primarily in T-SQL and SQL Server tooling (SSMS, Azure Data Studio)
  • You need full multi-table ACID transactions via T-SQL
  • Workload is structured data only with stable schemas and dimensional models
  • Gold-layer star schemas, conformed dimensions, and BI-ready fact tables
  • High-concurrency interactive query workloads serving many simultaneous Power BI users
  • You need ALTER COLUMN, Time Travel via T-SQL, or Datawarehouse Monitor for query observability
🚀

GPU Acceleration — Build 2026 (CoddSpeed)

  • NVIDIA accelerated computing built directly into the Warehouse query engine
  • No query rewrites, no infrastructure setup, no cluster to size
  • Enable in workspace settings — query optimizer routes eligible queries to GPU automatically
  • Up to 7× faster than three comparable cloud warehouses at 64-user concurrency (internal benchmarking, May 2026)
  • UNC Health: 5× production improvement on existing workloads
  • CoddSpeed: SIGMOD 2026 Best Industry Paper — first fully managed SaaS warehouse with GPU acceleration
  • Early access preview: July 2026
Unified Analyze Data Menu — June 2026

Fabric now provides a unified Analyse data with menu that gives a consistent way to move from data to analysis across Lakehouse, Warehouse, and Eventhouse. Instead of navigating different menus per workload, analysis starts from a single, predictable entry point. The SQL Query Editor in Warehouse also expanded with integrated experiences: OneLake Analytics (KQL and Spark on warehouse data), Copy Job creation directly from the editor, Power BI Direct Lake over OneLake semantic model creation, and Fabric Ontologies for defining business concepts over warehouse tables.

June 2026 — What Changed for Both

The June 2026 release (Build 2026, June 2) and the June Feature Summary (June 2, 2026) introduced substantial changes to both Lakehouse and Warehouse that affect architecture decisions made in 2026.

Data Warehouse — June 2026 Updates

FeatureStatusWhat It Changes
GPU-Accelerated Warehouse (CoddSpeed)Early Access — July 2026Up to 7× faster analytical queries at high concurrency. Enable in workspace settings. SIGMOD 2026 Best Industry Paper. No query rewrites required.
ALTER COLUMNGA — June 2026Change column length, precision, and scale with standard T-SQL syntax. Takes effect instantly with no data rewrite. Eliminates CTAS-based workarounds in CI/CD and migration workflows.
Time Travel via T-SQLPreviewQuery warehouse tables as they existed at any prior point in time using FOR TIMESTAMP AS OF. Default 30-day configurable retention window. Cross-warehouse time travel within the same workspace.
Time Travel at SQL Analytics EndpointPreview — June 2026Extends T-SQL time travel to the Lakehouse SQL analytics endpoint. Retention limited by per-table VACUUM settings. Important: this feature now means both Lakehouse and Warehouse support Time Travel via T-SQL.
Datawarehouse Monitor (formerly Query Activity)Preview — June 2026Single place to monitor running and historical queries. Per-query metrics: CPU time, data scanned, cache usage. Drill into query run history without writing T-SQL. Replaces the previous Query Activity view.
SQL Analytics Endpoint CI/CD (DacFx)Preview — June 2026Version-control and deploy SQL analytics endpoint objects via DacFx. Item Definition APIs enable programmatic CRUD for CI/CD pipeline integration and environment management.
Source: Everything new in Fabric Data Warehouse at Build 2026 · Fabric June 2026 Feature Summary

Lakehouse — June 2026 Updates

FeatureStatusWhat It Changes
Materialized Lake Views (MLV) extended lineageJune 2026MLVs now span Lakehouse boundaries across workspaces. One schedule in Gold cascades through Silver and Bronze automatically. Faulted nodes flag missing dependencies before execution. Removes the primary orchestration reason for choosing Warehouse at the Gold layer.
Refresh SQL Endpoint ActivityGA — June 2026Refresh the Lakehouse SQL analytics endpoint as a native pipeline step. Sync refreshes with ETL loads so Direct Lake semantic models always reflect the latest data.
OneLake Storage Tiers + Lifecycle ManagementPreview — June 2026Hot/Cool/Cold tiers with automatic lifecycle policies. Move Bronze archives to Cold storage automatically — reduce cost without deleting compliance data.
High Concurrency mode for LakehouseGA — January 2026Up to 5 independent Lakehouse jobs share one Spark session. Reduces startup wait time on smaller F-SKU capacities. Material improvement for interactive development on F2–F8.
Lakehouse Maintenance Activity (VACUUM + OPTIMIZE)PreviewRun VACUUM and OPTIMIZE as native pipeline steps — no separate maintenance notebook required. Schedule nightly in the same pipeline as ETL load.

Medallion Architecture — Layer Mapping in 2026

Microsoft’s recommended pattern places Lakehouse at Bronze and Silver, with a choice at Gold between Lakehouse-with-MLV and Data Warehouse. The June 2026 updates to both shift that choice in important ways.

LayerRecommended EnginePrimary ToolsWhen to Use Warehouse at This Layer
Bronze (Raw landing)LakehouseData Factory Copy Job, Eventstream, Data Mirroring, Spark notebooksAlmost never — Lakehouse handles all source formats; Bronze data is rarely structured-only at landing
Silver (Cleaned, conformed)LakehouseSpark notebooks (Delta MERGE), Materialized Lake Views, Dataflow Gen2Rarely — only if Silver data is already structured-only and BI teams need direct T-SQL write access to Silver tables
Gold (Curated, BI-ready)Either — depends on requirementsMLV (Lakehouse) or T-SQL star schema (Warehouse) + Direct Lake Power BIUse Warehouse at Gold when: multi-table T-SQL transactions are required; BI team works exclusively in T-SQL; high-concurrency GPU-accelerated queries needed; or schema requires ALTER COLUMN in CI/CD

Better Together — The Standard Enterprise Pattern

The most common production pattern in Fortune 500 Fabric deployments: Lakehouse for Bronze and Silver (Spark-based ETL, CDC, feature engineering), Data Warehouse for Gold (T-SQL star schemas, dimensional models, certified semantic models). All layers remain in OneLake — the Warehouse reads Lakehouse Silver tables via cross-database queries using three-part names, and the Gold layer adds only the curated dimensional model on top, not a separate copy of the data.

ℹ️
MLV at Gold vs Warehouse at Gold — How to Choose in 2026

Before Materialized Lake Views GA (March 2026), teams often chose Warehouse at Gold purely for automated materialisation — Spark notebooks required manual scheduling that a Warehouse stored procedure did not. MLVs with extended lineage (June 2026) eliminate that gap: the MLV automatically materialises Gold tables from Silver, dependency-aware, incremental, cross-workspace. Choose Warehouse at Gold when you additionally need full T-SQL DML, multi-table transactions, ALTER COLUMN in CI/CD, or the GPU-accelerated query engine at high concurrency. Choose Lakehouse with MLV at Gold when your Gold layer is primarily SQL-definable aggregations with no multi-table write requirements and BI teams consume via Direct Lake.

Ingestion — Which Path for Which Target

Every Fabric ingestion method can write to both Lakehouse and Warehouse. The right choice depends on the source type, latency requirement, and whether the team prefers code-first or low-code tooling — not on the destination engine.

Ingestion MethodBest TargetTypical Use
Copy Job Activity (GA April 2026)Lakehouse BronzeCDC from SQL Server, Azure SQL, Azure SQL MI (GA). Bulk load, incremental, SCD Type 2 for Warehouse. Auto-schema detection, no manual mapping.
Data Pipelines — Copy ActivityLakehouse or Warehouse90+ connectors, custom schema mapping, fault tolerance. Use when Copy Job does not expose the configuration needed.
Dataflow Gen2Lakehouse Silver or Warehouse GoldVisual Power Query with 200+ connectors. Low-code preparation and transformation. Writes Delta tables to OneLake destination.
Spark NotebooksLakehouse Bronze or SilverComplex ETL, CDC with Delta MERGE, custom schema enforcement, ML feature engineering. Cannot write directly to Warehouse tables — land in Lakehouse, cross-query from Warehouse.
EventstreamLakehouse or EventhouseStreaming from Event Hubs, IoT Hub, Kafka, CDC sources. Routes to Lakehouse Delta tables and/or Eventhouse KQL simultaneously.
Data MirroringLakehouse (as Delta tables in OneLake)Near-real-time zero-ETL replication from Azure SQL DB, Cosmos DB, Snowflake, Databricks Unity Catalog. CDC for SQL estates now GA.
⚠️
Spark Cannot Write Directly to Warehouse Tables

This is the most common architectural surprise for teams migrating from Databricks or Synapse Spark. Spark notebooks in Fabric write to Lakehouse Delta tables — they cannot write directly to Warehouse tables via spark.write.saveAsTable(). The standard pattern: Spark writes to a Lakehouse Silver table, the Warehouse reads it via cross-database query using three-part naming (silverlakehouse.schema.tablename) or loads it via a T-SQL CREATE TABLE AS SELECT or INSERT INTO ... SELECT.

Governance and Security — Key Differences

Both Lakehouse and Warehouse use OneLake at the storage layer, so Purview lineage, sensitivity labels, and OneLake audit logs apply to both. The difference is in how each engine exposes access control to teams who need fine-grained permissions.

🔐

Lakehouse Security Model

  • Workspace RBAC + item-level permissions
  • OneLake security roles (OneSecurity — Preview) — folder, table, row, column level
  • RLS at semantic model level for Direct Lake
  • Known limitation: OneSecurity blocks Direct Lake semantic models — manage RLS at semantic model level until resolved
  • Sensitivity labels via Microsoft Purview inherit automatically from M365
🔐

Warehouse Security Model

  • Full T-SQL security: schemas, roles, GRANT/REVOKE/DENY
  • Dynamic data masking for column-level obfuscation
  • Row-level security via T-SQL predicates
  • More familiar to DBAs from SQL Server and Azure Synapse
  • Warehouse permissions do not cascade to semantic model RLS automatically — configure at both layers

For a complete governance strategy covering both engines — OneLake catalog Govern tab, Domains, Purview DLP, and the Outbound Access Protection GA — see the Microsoft Fabric Data Governance Tutorial.

Cost and Performance — What Drives the Numbers

Both Lakehouse and Warehouse consume CUs from the same F-SKU capacity pool. There is no separate pricing tier per engine. Cost differences come from workload patterns, not from being charged differently per engine type.

FactorLakehouse (Spark)Data Warehouse (T-SQL)
Interactive query latency5–15 second Spark cold start (Starter Pools). Subsequent queries fast within session.No cold start — serverless SQL responds immediately. Preferred for interactive BI with many users.
High-concurrency BISQL analytics endpoint handles concurrent T-SQL. GPU acceleration not available.Optimised for high-concurrency. GPU acceleration (CoddSpeed) delivers up to 7× improvement at 64-user concurrency — early access July 2026.
Heavy transformation (TB+)Spark with Starter Pools — cost-effective for large-scale batch jobs. Partition, broadcast, and V-Order for performance.T-SQL engine handles large loads well — no advantage over Spark for volume. Choose based on skill set.
ML and feature engineeringSpark — full Python ecosystem, scikit-learn, MLflow. Notebooks access all Lakehouse data.Not the right tool — use Lakehouse for ML workloads even in a Warehouse-primary architecture.
Storage cost optimisationOneLake storage tiers (Preview, June 2026) — move cold Bronze data to Cool/Cold tier automatically.Configurable retention for Time Travel (default 30 days). Longer retention increases storage cost.

For detailed CU estimation and workload-specific cost modelling, the Microsoft Fabric Pricing Calculator covers both Lakehouse Spark CU consumption and Warehouse SQL CU consumption by workload type. For production stability, burst management, and capacity right-sizing, the Fabric Capacity Optimization guide covers the scheduling and pause strategies that prevent throttling.

Frequently Asked Questions

No. In Microsoft Fabric, Lakehouse and Data Warehouse are designed to work together — not to replace each other. Both store Delta tables on OneLake. The choice is about compute surface, not storage. Lakehouse is optimised for Spark-based engineering and data science. Warehouse is optimised for T-SQL, multi-table transactions, and high-concurrency BI. Microsoft’s official guidance explicitly states you can add either one at any point — starting with Lakehouse does not preclude Warehouse later, and both use the same Polaris SQL engine for T-SQL queries. Source: Microsoft Learn — What is Fabric Data Warehouse.
GPU-accelerated Fabric Data Warehouse (CoddSpeed) was announced at Build 2026 (June 2, 2026). It uses NVIDIA accelerated computing built directly into the Warehouse query execution engine. The query optimizer automatically routes eligible analytical queries to the GPU — no query rewrites, no infrastructure configuration, no new system to manage. Enable it in workspace settings and it applies to all SQL Analytics Endpoints and Data Warehouses in that workspace. In internal benchmarking conducted in May 2026, it ran up to 7× faster than three comparable cloud data warehouses at 64-user concurrency. UNC Health reported 5× improvement on existing production workloads. The underlying research — CoddSpeed: Hardware Accelerated Query Processing in Microsoft Fabric — won the SIGMOD 2026 Best Industry Paper award. Early access preview is scheduled for July 2026. Source: Microsoft Fabric blog.
Microsoft’s official decision framework uses three criteria. Development model: if your team prefers Spark/PySpark/Python → Lakehouse; if your team prefers T-SQL → Warehouse. Multi-table transactions: if you need full ACID across multiple tables via T-SQL → Warehouse; if single-table Delta MERGE is sufficient → Lakehouse. Data complexity: if data includes unstructured or semi-structured formats → Lakehouse; if structured data only → Warehouse. When all three answers point the same direction, the decision is clear. When they point different directions — common in teams with mixed skill sets building medallion architectures — the Better Together pattern (Lakehouse for Bronze/Silver, Warehouse for Gold) is the standard approach. Source: Microsoft Fabric decision guide.
Materialized Lake Views (MLV) reached General Availability in March 2026. They let you define transformation logic in SQL or PySpark — Fabric materialises the result as a Delta table and manages incremental refresh, dependency tracking, and scheduling automatically. Extended lineage (June 2026) means MLVs now span Lakehouse boundaries across workspaces: one schedule in a Gold Lakehouse cascades through Silver and Bronze automatically. Before MLVs, teams often chose Warehouse at Gold purely for automated materialisation. MLVs close that gap — a Lakehouse with MLVs can now handle many of the same Gold-layer use cases without Warehouse. Choose Warehouse at Gold when you additionally need multi-table T-SQL transactions, ALTER COLUMN in CI/CD, or the GPU-accelerated query engine at high concurrency. See the Fabric Lakehouse Tutorial for a full MLV deep-dive.
No. Spark notebooks in Fabric write to Lakehouse Delta tables — they cannot write directly to Warehouse tables via spark.write.saveAsTable() or spark.write.format("delta") targeting a Warehouse. The standard pattern in a Better Together architecture: Spark transforms and writes to a Lakehouse Silver table, then the Warehouse loads or references it via cross-database T-SQL query using three-part names (silverlakehouse.schema.tablename), or via a T-SQL INSERT INTO gold_table SELECT * FROM silverlakehouse.schema.tablename. This is not a workaround — it is the documented Microsoft pattern for combining both engines.
Both approaches are valid and Microsoft supports both. Separate Lakehouses per tier (Bronze Lakehouse, Silver Lakehouse, Gold Lakehouse) improves security boundaries — workspace RBAC means Bronze access can be restricted to engineering only, while Gold access is open to analysts — and makes lineage and ownership clearer in the OneLake catalog. The trade-off is higher item count and more cross-Lakehouse shortcut configuration. Single Lakehouse with schemas (bronze schema, silver schema, gold schema) reduces management overhead but requires strict governance documentation and means all workspace members see all tiers. For small teams with three engineers, a single Lakehouse with schemas is often simpler. For larger organisations with separate team ownership per tier, separate Lakehouses per tier with MLV extended lineage (June 2026) is the cleaner architecture.
Use Eventhouse (backed by KQL/Azure Data Explorer) when your workload involves streaming event data and you need sub-second interactive query latency on time-series or high-granularity data — IoT telemetry, clickstream analytics, security event correlation, or operational dashboards. The Microsoft data store decision guide explicitly positions Eventhouse for streaming event data and high-granularity interactive analytics. Eventhouse is not a replacement for Lakehouse or Warehouse — it is a third path for a specific workload type. Eventstream can route data to Eventhouse and Lakehouse simultaneously, so operational real-time dashboards can run on Eventhouse while historical batch analysis runs on the same data in Lakehouse. Source: Microsoft Fabric decision guide — choose a data store.
⚠ Accuracy Disclaimer

Feature descriptions are based on official Microsoft Learn documentation, the Fabric June 2026 Feature Summary, and the Build 2026 Azure blog (published June 2, 2026). GPU-accelerated Warehouse early access is scheduled for July 2026 — not yet GA. Verify current feature status at learn.microsoft.com/fabric/fundamentals/whats-new. UIG Data Lab is an independent publication, not affiliated with or endorsed by Microsoft Corporation.

A.J. Data Engineering Researcher & Technical Writer · UIG Data Lab
A.J. researches and writes about data engineering, analytics architecture, Microsoft Fabric, and modern cloud data platforms. Coverage spans Microsoft Fabric, Power BI, Azure Data Engineering, Databricks, Snowflake, Apache Spark, dbt, Apache Airflow, and modern cloud data infrastructure. The focus is practitioner-level content that helps data professionals understand platform capabilities, evaluate technology decisions, optimise costs, and implement practical solutions using official documentation, product updates, community insights, and industry best practices. His writing covers real decisions from real deployments — not documentation rewrites.
Microsoft Fabric Lakehouse Data Warehouse OneLake Delta Lake Medallion Architecture Direct Lake Apache Spark Power BI Azure

Scroll to Top