Fabric Lakehouse vs Data Warehouse Best Practices — 2026 Guide
The official Microsoft decision framework, the June 2026 updates that change the calculus — GPU-accelerated Warehouse, Materialized Lake Views GA, ALTER COLUMN, Time Travel at SQL analytics endpoint, Datawarehouse Monitor — and the production patterns that enterprise teams actually use.
Both Lakehouse and Data Warehouse store Delta tables on OneLake — the storage choice is not the decision. Microsoft’s official framework reduces to three questions: Do you prefer Spark or T-SQL? Do you need multi-table ACID transactions? Is your data structured only, or mixed? As of June 2026, the Warehouse gained GPU acceleration (CoddSpeed, SIGMOD 2026 Best Industry Paper) delivering up to 7× faster performance at high concurrency — early access preview July 2026. The Lakehouse gained Materialized Lake Views GA with cross-workspace extended lineage, closing the Gold-layer automation gap that previously pushed teams toward Warehouse by default.
Fabric Lakehouse vs Data Warehouse: The Decision Framework
Microsoft publishes a formal decision guide at learn.microsoft.com/fabric/fundamentals/decision-guide-lakehouse-warehouse. It reduces to a three-question tree. The framework applies regardless of data volume — it is about skill set, transaction requirements, and data shape, not scale.
| Decision Criterion | Answer → Use Lakehouse | Answer → Use Warehouse |
|---|---|---|
| Development model How does your team prefer to work? | Spark, PySpark, Scala, Python notebooks | T-SQL, SQL Server tooling (SSMS, Azure Data Studio) |
| Multi-table transactions Do you need full ACID across multiple tables? | No — single-table ACID via Delta is sufficient | Yes — you need INSERT/UPDATE/DELETE/MERGE across multiple tables atomically |
| Data complexity What formats are you analysing? | Mixed: unstructured, semi-structured (JSON, logs, images), or don’t know yet | Structured only: relational data with stable schemas |
Microsoft’s official guidance explicitly states that you can always add one or the other at a later point should your business needs change. Starting with Lakehouse does not lock you out of Warehouse. Both use the same Polaris SQL engine for T-SQL queries. Regardless of where you start, compute is fully compatible and data stays in the same OneLake location.
The most common mistake in this decision is choosing based on team size or data volume rather than the three criteria above. A team of two data engineers can be building a Warehouse-first architecture. A team of twenty can be entirely Lakehouse-first. The decision is about what tools your team knows and what transactional guarantees your Gold layer needs — not how big you are. Defaulting to Warehouse because it “sounds more enterprise” and defaulting to Lakehouse because it “sounds more modern” are both wrong reasons to choose.
OneLake — Why the Storage Is No Longer the Decision
In every data platform that predates Fabric, choosing a storage engine was the primary architectural decision: Parquet on ADLS vs. Synapse Dedicated Pool vs. Databricks Delta vs. Snowflake internal storage. Each choice created a silo. Moving data between them required ETL pipelines, refresh schedules, and duplicate storage costs.
OneLake changes this completely. Both Lakehouse and Data Warehouse store their data as Delta Parquet files in OneLake. There is one physical location. The Lakehouse Spark engine, the Lakehouse SQL analytics endpoint, and the Warehouse T-SQL engine all read the same files. Power BI Direct Lake reads the same files. No copies. No synchronisation.
The decision between Lakehouse and Warehouse is therefore a decision about which compute surface and access model fits your workload — not which storage system to commit to. You can switch, combine, or add later without moving data.
| Capability | Lakehouse | Data Warehouse |
|---|---|---|
| Storage format | Delta Parquet in OneLake | Delta Parquet in OneLake |
| Primary compute | Apache Spark (Runtime 1.3 — Spark 3.5) | Polaris distributed T-SQL engine |
| SQL access | Read-only SQL analytics endpoint (T-SQL) | Full read/write T-SQL with DML |
| DML (INSERT/UPDATE/DELETE) | Via Spark only — not in SQL analytics endpoint | Full T-SQL DML — multi-table transactions |
| Power BI Direct Lake | Supported — SQL analytics endpoint must be refreshed | Supported — Refresh SQL Endpoint activity (GA June 2026) |
| Shortcuts (external data) | Full support — ADLS, S3, GCS, other Lakehouses | Read via cross-database queries to Lakehouse |
| Materialized Lake Views | GA March 2026 — SQL and PySpark, cross-workspace lineage (June 2026) | Not available |
| Time Travel | Via Spark or SQL analytics endpoint (Preview, June 2026) | T-SQL — FOR TIMESTAMP AS OF — configurable retention (default 30 days) |
Fabric Lakehouse — What It Is and When to Use It
A Fabric Lakehouse is a data architecture platform built on Delta tables in OneLake. It has two primary access surfaces: the Spark engine (via notebooks and Spark Job Definitions) for code-first data engineering and data science, and the SQL analytics endpoint (a read-only T-SQL interface) for SQL-based consumption without Spark.
When you create a Lakehouse, Fabric automatically creates the SQL analytics endpoint and a default Power BI semantic model alongside it. SQL analysts and Power BI users get immediate access to the same data the Spark engineer just wrote — with no separate load step.
Lakehouse Is the Right Choice When
- Your team writes PySpark, Scala, or Python notebooks daily
- Data includes JSON, logs, images, or other semi-structured/unstructured formats
- Schema evolves rapidly and schema-on-read flexibility is needed
- Workload involves ML feature engineering, model training, or data science experiments
- Bronze and Silver medallion layers with fast-changing raw data
- You want Materialized Lake Views for automated incremental Gold-layer materialisation
Lakehouse Limitations to Know
- SQL analytics endpoint is read-only — no DML via T-SQL
- Multi-table T-SQL transactions are not supported — use Spark Delta MERGE for upserts
- ALTER COLUMN is not available at the SQL analytics endpoint
- Time Travel via T-SQL at SQL analytics endpoint is Preview (June 2026) — limited by VACUUM retention settings per table
- OneSecurity (OneLake data access roles) is incompatible with Direct Lake semantic models on the same Lakehouse (known issue, June 2026)
The Lakehouse SQL analytics endpoint gets a Refresh SQL Endpoint Activity (GA, June 2026) as a native pipeline step — refresh the endpoint on demand inside the same pipeline as your ETL load, so downstream BI queries always read the latest state without a separate scheduling workaround. See the full deep-dive in the Fabric Lakehouse Tutorial.
Fabric Data Warehouse — What It Is and When to Use It
A Fabric Data Warehouse is a lake-centric, distributed T-SQL engine built on Delta tables in OneLake. It provides the full SQL Server-compatible T-SQL surface: DDL, DML (INSERT, UPDATE, DELETE, MERGE), stored procedures, views, functions, schemas, and cross-database querying via three-part names. Teams migrating from SQL Server, Azure Synapse Dedicated Pools, or Teradata will recognise the development model immediately.
Warehouse Is the Right Choice When
- Your team works primarily in T-SQL and SQL Server tooling (SSMS, Azure Data Studio)
- You need full multi-table ACID transactions via T-SQL
- Workload is structured data only with stable schemas and dimensional models
- Gold-layer star schemas, conformed dimensions, and BI-ready fact tables
- High-concurrency interactive query workloads serving many simultaneous Power BI users
- You need ALTER COLUMN, Time Travel via T-SQL, or Datawarehouse Monitor for query observability
GPU Acceleration — Build 2026 (CoddSpeed)
- NVIDIA accelerated computing built directly into the Warehouse query engine
- No query rewrites, no infrastructure setup, no cluster to size
- Enable in workspace settings — query optimizer routes eligible queries to GPU automatically
- Up to 7× faster than three comparable cloud warehouses at 64-user concurrency (internal benchmarking, May 2026)
- UNC Health: 5× production improvement on existing workloads
- CoddSpeed: SIGMOD 2026 Best Industry Paper — first fully managed SaaS warehouse with GPU acceleration
- Early access preview: July 2026
Fabric now provides a unified Analyse data with menu that gives a consistent way to move from data to analysis across Lakehouse, Warehouse, and Eventhouse. Instead of navigating different menus per workload, analysis starts from a single, predictable entry point. The SQL Query Editor in Warehouse also expanded with integrated experiences: OneLake Analytics (KQL and Spark on warehouse data), Copy Job creation directly from the editor, Power BI Direct Lake over OneLake semantic model creation, and Fabric Ontologies for defining business concepts over warehouse tables.
June 2026 — What Changed for Both
The June 2026 release (Build 2026, June 2) and the June Feature Summary (June 2, 2026) introduced substantial changes to both Lakehouse and Warehouse that affect architecture decisions made in 2026.
Data Warehouse — June 2026 Updates
| Feature | Status | What It Changes |
|---|---|---|
| GPU-Accelerated Warehouse (CoddSpeed) | Early Access — July 2026 | Up to 7× faster analytical queries at high concurrency. Enable in workspace settings. SIGMOD 2026 Best Industry Paper. No query rewrites required. |
| ALTER COLUMN | GA — June 2026 | Change column length, precision, and scale with standard T-SQL syntax. Takes effect instantly with no data rewrite. Eliminates CTAS-based workarounds in CI/CD and migration workflows. |
| Time Travel via T-SQL | Preview | Query warehouse tables as they existed at any prior point in time using FOR TIMESTAMP AS OF. Default 30-day configurable retention window. Cross-warehouse time travel within the same workspace. |
| Time Travel at SQL Analytics Endpoint | Preview — June 2026 | Extends T-SQL time travel to the Lakehouse SQL analytics endpoint. Retention limited by per-table VACUUM settings. Important: this feature now means both Lakehouse and Warehouse support Time Travel via T-SQL. |
| Datawarehouse Monitor (formerly Query Activity) | Preview — June 2026 | Single place to monitor running and historical queries. Per-query metrics: CPU time, data scanned, cache usage. Drill into query run history without writing T-SQL. Replaces the previous Query Activity view. |
| SQL Analytics Endpoint CI/CD (DacFx) | Preview — June 2026 | Version-control and deploy SQL analytics endpoint objects via DacFx. Item Definition APIs enable programmatic CRUD for CI/CD pipeline integration and environment management. |
Lakehouse — June 2026 Updates
| Feature | Status | What It Changes |
|---|---|---|
| Materialized Lake Views (MLV) extended lineage | June 2026 | MLVs now span Lakehouse boundaries across workspaces. One schedule in Gold cascades through Silver and Bronze automatically. Faulted nodes flag missing dependencies before execution. Removes the primary orchestration reason for choosing Warehouse at the Gold layer. |
| Refresh SQL Endpoint Activity | GA — June 2026 | Refresh the Lakehouse SQL analytics endpoint as a native pipeline step. Sync refreshes with ETL loads so Direct Lake semantic models always reflect the latest data. |
| OneLake Storage Tiers + Lifecycle Management | Preview — June 2026 | Hot/Cool/Cold tiers with automatic lifecycle policies. Move Bronze archives to Cold storage automatically — reduce cost without deleting compliance data. |
| High Concurrency mode for Lakehouse | GA — January 2026 | Up to 5 independent Lakehouse jobs share one Spark session. Reduces startup wait time on smaller F-SKU capacities. Material improvement for interactive development on F2–F8. |
| Lakehouse Maintenance Activity (VACUUM + OPTIMIZE) | Preview | Run VACUUM and OPTIMIZE as native pipeline steps — no separate maintenance notebook required. Schedule nightly in the same pipeline as ETL load. |
Medallion Architecture — Layer Mapping in 2026
Microsoft’s recommended pattern places Lakehouse at Bronze and Silver, with a choice at Gold between Lakehouse-with-MLV and Data Warehouse. The June 2026 updates to both shift that choice in important ways.
| Layer | Recommended Engine | Primary Tools | When to Use Warehouse at This Layer |
|---|---|---|---|
| Bronze (Raw landing) | Lakehouse | Data Factory Copy Job, Eventstream, Data Mirroring, Spark notebooks | Almost never — Lakehouse handles all source formats; Bronze data is rarely structured-only at landing |
| Silver (Cleaned, conformed) | Lakehouse | Spark notebooks (Delta MERGE), Materialized Lake Views, Dataflow Gen2 | Rarely — only if Silver data is already structured-only and BI teams need direct T-SQL write access to Silver tables |
| Gold (Curated, BI-ready) | Either — depends on requirements | MLV (Lakehouse) or T-SQL star schema (Warehouse) + Direct Lake Power BI | Use Warehouse at Gold when: multi-table T-SQL transactions are required; BI team works exclusively in T-SQL; high-concurrency GPU-accelerated queries needed; or schema requires ALTER COLUMN in CI/CD |
Better Together — The Standard Enterprise Pattern
The most common production pattern in Fortune 500 Fabric deployments: Lakehouse for Bronze and Silver (Spark-based ETL, CDC, feature engineering), Data Warehouse for Gold (T-SQL star schemas, dimensional models, certified semantic models). All layers remain in OneLake — the Warehouse reads Lakehouse Silver tables via cross-database queries using three-part names, and the Gold layer adds only the curated dimensional model on top, not a separate copy of the data.
Before Materialized Lake Views GA (March 2026), teams often chose Warehouse at Gold purely for automated materialisation — Spark notebooks required manual scheduling that a Warehouse stored procedure did not. MLVs with extended lineage (June 2026) eliminate that gap: the MLV automatically materialises Gold tables from Silver, dependency-aware, incremental, cross-workspace. Choose Warehouse at Gold when you additionally need full T-SQL DML, multi-table transactions, ALTER COLUMN in CI/CD, or the GPU-accelerated query engine at high concurrency. Choose Lakehouse with MLV at Gold when your Gold layer is primarily SQL-definable aggregations with no multi-table write requirements and BI teams consume via Direct Lake.
Ingestion — Which Path for Which Target
Every Fabric ingestion method can write to both Lakehouse and Warehouse. The right choice depends on the source type, latency requirement, and whether the team prefers code-first or low-code tooling — not on the destination engine.
| Ingestion Method | Best Target | Typical Use |
|---|---|---|
| Copy Job Activity (GA April 2026) | Lakehouse Bronze | CDC from SQL Server, Azure SQL, Azure SQL MI (GA). Bulk load, incremental, SCD Type 2 for Warehouse. Auto-schema detection, no manual mapping. |
| Data Pipelines — Copy Activity | Lakehouse or Warehouse | 90+ connectors, custom schema mapping, fault tolerance. Use when Copy Job does not expose the configuration needed. |
| Dataflow Gen2 | Lakehouse Silver or Warehouse Gold | Visual Power Query with 200+ connectors. Low-code preparation and transformation. Writes Delta tables to OneLake destination. |
| Spark Notebooks | Lakehouse Bronze or Silver | Complex ETL, CDC with Delta MERGE, custom schema enforcement, ML feature engineering. Cannot write directly to Warehouse tables — land in Lakehouse, cross-query from Warehouse. |
| Eventstream | Lakehouse or Eventhouse | Streaming from Event Hubs, IoT Hub, Kafka, CDC sources. Routes to Lakehouse Delta tables and/or Eventhouse KQL simultaneously. |
| Data Mirroring | Lakehouse (as Delta tables in OneLake) | Near-real-time zero-ETL replication from Azure SQL DB, Cosmos DB, Snowflake, Databricks Unity Catalog. CDC for SQL estates now GA. |
This is the most common architectural surprise for teams migrating from Databricks or Synapse Spark. Spark notebooks in Fabric write to Lakehouse Delta tables — they cannot write directly to Warehouse tables via spark.write.saveAsTable(). The standard pattern: Spark writes to a Lakehouse Silver table, the Warehouse reads it via cross-database query using three-part naming (silverlakehouse.schema.tablename) or loads it via a T-SQL CREATE TABLE AS SELECT or INSERT INTO ... SELECT.
Governance and Security — Key Differences
Both Lakehouse and Warehouse use OneLake at the storage layer, so Purview lineage, sensitivity labels, and OneLake audit logs apply to both. The difference is in how each engine exposes access control to teams who need fine-grained permissions.
Lakehouse Security Model
- Workspace RBAC + item-level permissions
- OneLake security roles (OneSecurity — Preview) — folder, table, row, column level
- RLS at semantic model level for Direct Lake
- Known limitation: OneSecurity blocks Direct Lake semantic models — manage RLS at semantic model level until resolved
- Sensitivity labels via Microsoft Purview inherit automatically from M365
Warehouse Security Model
- Full T-SQL security: schemas, roles, GRANT/REVOKE/DENY
- Dynamic data masking for column-level obfuscation
- Row-level security via T-SQL predicates
- More familiar to DBAs from SQL Server and Azure Synapse
- Warehouse permissions do not cascade to semantic model RLS automatically — configure at both layers
For a complete governance strategy covering both engines — OneLake catalog Govern tab, Domains, Purview DLP, and the Outbound Access Protection GA — see the Microsoft Fabric Data Governance Tutorial.
Cost and Performance — What Drives the Numbers
Both Lakehouse and Warehouse consume CUs from the same F-SKU capacity pool. There is no separate pricing tier per engine. Cost differences come from workload patterns, not from being charged differently per engine type.
| Factor | Lakehouse (Spark) | Data Warehouse (T-SQL) |
|---|---|---|
| Interactive query latency | 5–15 second Spark cold start (Starter Pools). Subsequent queries fast within session. | No cold start — serverless SQL responds immediately. Preferred for interactive BI with many users. |
| High-concurrency BI | SQL analytics endpoint handles concurrent T-SQL. GPU acceleration not available. | Optimised for high-concurrency. GPU acceleration (CoddSpeed) delivers up to 7× improvement at 64-user concurrency — early access July 2026. |
| Heavy transformation (TB+) | Spark with Starter Pools — cost-effective for large-scale batch jobs. Partition, broadcast, and V-Order for performance. | T-SQL engine handles large loads well — no advantage over Spark for volume. Choose based on skill set. |
| ML and feature engineering | Spark — full Python ecosystem, scikit-learn, MLflow. Notebooks access all Lakehouse data. | Not the right tool — use Lakehouse for ML workloads even in a Warehouse-primary architecture. |
| Storage cost optimisation | OneLake storage tiers (Preview, June 2026) — move cold Bronze data to Cool/Cold tier automatically. | Configurable retention for Time Travel (default 30 days). Longer retention increases storage cost. |
For detailed CU estimation and workload-specific cost modelling, the Microsoft Fabric Pricing Calculator covers both Lakehouse Spark CU consumption and Warehouse SQL CU consumption by workload type. For production stability, burst management, and capacity right-sizing, the Fabric Capacity Optimization guide covers the scheduling and pause strategies that prevent throttling.
Frequently Asked Questions
spark.write.saveAsTable() or spark.write.format("delta") targeting a Warehouse. The standard pattern in a Better Together architecture: Spark transforms and writes to a Lakehouse Silver table, then the Warehouse loads or references it via cross-database T-SQL query using three-part names (silverlakehouse.schema.tablename), or via a T-SQL INSERT INTO gold_table SELECT * FROM silverlakehouse.schema.tablename. This is not a workaround — it is the documented Microsoft pattern for combining both engines.Feature descriptions are based on official Microsoft Learn documentation, the Fabric June 2026 Feature Summary, and the Build 2026 Azure blog (published June 2, 2026). GPU-accelerated Warehouse early access is scheduled for July 2026 — not yet GA. Verify current feature status at learn.microsoft.com/fabric/fundamentals/whats-new. UIG Data Lab is an independent publication, not affiliated with or endorsed by Microsoft Corporation.



