Tutorial Series · Part 03 · Microsoft Fabric

Data Warehousing in Fabric — Complete 2026 Guide

Everything you need to design, implement, optimize, and operate a production data warehouse on Microsoft Fabric — from medallion architecture and T-SQL patterns to GPU-accelerated performance, AI Functions, and Synapse migration. Verified against official Microsoft documentation through June 2026.

Quick Answer

The Fabric Data Warehouse is an enterprise-grade, SaaS-delivered relational warehouse built on open Delta Parquet tables in OneLake. It provides full T-SQL — DDL/DML, stored procedures, materialized views, and multi-table ACID transactions — while sharing storage with Lakehouse and Power BI Direct Lake with no data movement. As of June 2026, a GPU-accelerated version entered early access preview, delivering up to 7× faster performance at 64-user concurrency. AI Functions now default to gpt-5-mini. Choose it over Lakehouse when your team works primarily in T-SQL and needs governed, curated star or snowflake schemas for BI.

📅 Last verified: June 2026 ⏱ ~18 min read ✍️ A.J., Data Engineering Researcher 🔗 Source: Microsoft Learn

Why Data Warehousing in Fabric?

The Fabric Data Warehouse is not another cloud SQL service layered on top of object storage. It is built lake-native from the ground up — data lives as open Delta Parquet files in OneLake, which means the same rows that power your T-SQL queries are immediately accessible to Spark notebooks, Power BI Direct Lake models, and Fabric Data Agents without any ETL, copy, or export step.

That architecture eliminates the storage duplication and pipeline complexity that drives up cost and staleness in traditional warehouse-plus-lake setups. One copy of data, governed in one place, served to every consumer through the compute engine best suited to their work.

🔷

ACID on Open Delta

Full multi-table ACID transactions on Delta Parquet. No proprietary storage format — your data stays open and portable.

📊

Built-in Power BI

Warehouse tables feed Power BI Direct Lake models directly — no import, no dual storage, no refresh lag for large semantic models.

SQL + Spark Together

T-SQL and PySpark read from the same Delta tables in OneLake. Use notebooks for heavy ELT, warehouse for governed serving — no data movement between them.

🏗️

OneLake-Native

All warehouse data lives in OneLake. Shortcuts, mirroring, and cross-workspace reads work without copying data into or out of the warehouse.

📌
Part of the Fabric Tutorial Series

This is Part 03 of the Microsoft Fabric Tutorial Series. If you are new to Fabric, start with Part 01 — Microsoft Fabric Overview and Part 02 — Fabric Lakehouse Tutorial first.

Architecture — Warehouse, Lakehouse & SQL Analytics Endpoint

Fabric has two primary warehousing surfaces. Understanding which to use — and when to combine them — is the most important architectural decision you will make early in a Fabric deployment.

FeatureFabric Data WarehouseLakehouse SQL Analytics Endpoint
Primary personaSQL/BI developers, DBAsData engineers, Spark users
DDL supportFull — CREATE, ALTER, DROPRead-only (auto-generated from Delta)
DML supportFull — INSERT, UPDATE, DELETE, MERGENot supported via SQL endpoint
Stored procedures✅ Yes❌ No
Materialized views✅ Yes❌ No
Multi-table ACID✅ Yes❌ No
Storage formatDelta Parquet in OneLakeDelta Parquet in OneLake
Direct Lake Power BI✅ Yes✅ Yes
🏦

Choose Fabric Data Warehouse when…

  • Your team works primarily in T-SQL
  • You need stored procedures, materialized views, or multi-table ACID
  • Workloads are structured star/snowflake schemas for governed BI
  • You are migrating from Synapse Dedicated SQL Pools or SQL Server
🏞️

Choose Lakehouse when…

  • Spark is your primary compute for ELT and data science
  • You need to store mixed, semi-structured, or raw data at scale
  • You want to expose curated layers through the auto-generated SQL endpoint
  • You plan to combine a Lakehouse and Warehouse in the same architecture

For a deeper comparison including cost trade-offs, see Fabric Lakehouse vs Data Warehouse — Architecture Decision Guide.

Schema Design — Medallion Architecture & Modeling Patterns

Structure your data across clear layers in OneLake and Warehouse. Each layer has a distinct purpose — preserving raw lineage while delivering clean, governed data to BI consumers.

  1. Raw / LandingPreserve original files immutably — CSV, JSON, Parquet — in Lakehouse or mirrored sources. Never transform in this layer. It is your compliance and replay foundation.
  2. Bronze / StagingLight cleaning: standardize schema, enforce basic types, tag source system and ingestion timestamp. Feed downstream transformations from here.
  3. Silver / RefinedNormalize, deduplicate, apply business rules. Typically Lakehouse tables populated by Spark notebooks or Dataflow Gen2. Most complex transformation logic lives here.
  4. Gold / WarehouseDenormalized Warehouse tables — facts and dimensions — tuned for T-SQL queries, materialized views, and Power BI semantic models. This is your governed serving layer.

Modeling Best Practices

Use star schemas with narrow fact tables and conformed dimensions. This simplifies DAX, reduces query complexity for Power BI, and keeps fact table scans fast. Partition facts by date or business period to align with common filter patterns and optimize Delta file scan ranges.

When ingesting from multiple source systems, use surrogate keys in dimensions and maintain business keys for lineage. Adopt a consistent naming convention — for example fact_ prefix for fact tables and dim_ for dimensions — to improve discoverability in the Fabric item catalog.

Field note — A.J., UIG Data Lab

The most common modeling mistake in Fabric is replicating wide, flat tables from Synapse or SQL Server. Fabric’s distributed engine performs best on narrow fact tables with foreign key joins to small dimension tables. If your fact table has more than 50 columns, it is almost certainly carrying attributes that belong in a dimension.

Implementation — T-SQL, Delta Lake & Spark Patterns

Create a curated fact table with CTAS

Use CREATE TABLE AS SELECT to build initial fact tables from refined Silver layer data. Fabric handles distribution automatically — do not include Synapse-style DISTRIBUTION or CLUSTERED COLUMNSTORE INDEX syntax.

CREATE TABLE dw.fact_sales
AS
SELECT
  order_id,
  customer_id,
  product_id,
  CAST(sale_date  AS DATE)           AS sale_date,
  CAST(quantity   AS INT)            AS quantity,
  CAST(amount     AS DECIMAL(18,2))  AS amount
FROM refined.sales_transactions
WHERE amount >= 0;

MERGE for idempotent incremental loads

Use MERGE for upsert logic on incremental fact table loads from CDC or staged updates.

MERGE INTO dw.fact_sales AS target
USING staging.sales_updates AS source
  ON target.order_id = source.order_id
WHEN MATCHED THEN
  UPDATE SET
    target.quantity  = source.quantity,
    target.amount    = source.amount,
    target.sale_date = source.sale_date
WHEN NOT MATCHED THEN
  INSERT (order_id, customer_id, product_id, sale_date, quantity, amount)
  VALUES (source.order_id, source.customer_id, source.product_id,
          source.sale_date, source.quantity, source.amount);

Hybrid — Spark notebook + Warehouse CTAS

Use Spark for heavy distributed transformations in the Silver layer, then CTAS in the Warehouse for the governed Gold table. Both read the same Delta files in OneLake.

# PySpark in Fabric notebook — build Silver layer
df_raw = spark.read.table("landing.sales_raw")
df_clean = (
    df_raw
    .dropDuplicates(["order_id"])
    .filter("amount >= 0")
)
df_clean.write.format("delta").mode("overwrite")     .saveAsTable("refined.sales_clean")
# Fabric Warehouse then uses refined.sales_clean as source for CTAS
🆕
June 2026 — Optimised Warehouse to Lakehouse Data Movement

The June 2026 Fabric update includes a performance improvement for Warehouse to Lakehouse data movement via Dataflow Gen2 staging compute. If you move data from Warehouse into a Lakehouse destination via Dataflow Gen2, this is now faster without any configuration change.

For Spark transformation patterns in detail, see Transform Data Using Notebooks in Fabric.

Performance Tuning & Cost Optimization

Fabric’s distributed SQL engine provides automatic workload management for standard workloads. Physical design and query patterns still determine cost and latency at scale.

  1. Partition by date for fact tablesPartition large fact tables by the date column that aligns with your most common query filters. This allows the engine to skip entire Delta file partitions rather than full scans.
  2. OPTIMIZE and Z-ORDERRun OPTIMIZE periodically to compact small Delta files created by incremental loads. Z-ORDER on your most selective filter columns — such as customer_id or product_id — to co-locate related rows and reduce I/O.
  3. Materialized views for hot aggregationsCreate materialized views for expensive aggregations that power high-concurrency dashboards. The engine refreshes them incrementally without manual refresh scheduling.
  4. Use Direct Lake for Power BIConnect Power BI semantic models to Warehouse tables via Direct Lake mode rather than Import. This eliminates dual storage, removes import refresh latency, and reduces capacity consumption.
  5. Monitor with Capacity MetricsReview query history and the Fabric Capacity Metrics app regularly to identify long-running queries and workloads that should be rescheduled off-peak.
-- Compact Delta files and Z-ORDER on selective filter column
OPTIMIZE dw.fact_sales
ZORDER BY (customer_id);

-- Materialized view for a dashboard aggregation
CREATE MATERIALIZED VIEW dw.mv_sales_by_region
AS
SELECT
  region,
  CAST(sale_date AS DATE)  AS sale_date,
  SUM(amount)              AS total_revenue,
  COUNT(*)                 AS order_count
FROM dw.fact_sales f
JOIN dw.dim_customer c ON f.customer_id = c.customer_id
GROUP BY region, CAST(sale_date AS DATE);

For deeper capacity planning and pause/resume strategies, see Fabric Data Warehouse Optimization and Fabric Capacity Optimization.

GPU-Accelerated Warehouse & AI Functions — June 2026

GPU-Accelerated Fabric Data Warehouse

At Microsoft Build 2026, Microsoft announced GPU-accelerated Fabric Data Warehouse as an early access preview, entering early access in June 2026 with GA targeted for July 2026.

🚀
What It Is

GPU-accelerated Fabric Data Warehouse uses NVIDIA accelerated computing integrated directly into the Fabric query engine. Eligible queries run on GPU hardware automatically — no query rewrites, no infrastructure management, no configuration required. The underlying research is CoddSpeed, published at ACM SIGMOD 2026 where it won the Best Industry Paper award, using a GPU-based execution engine derived from Microsoft’s Tensor Query Processor (TQP).

DetailValue (June 2026)
Performance gainUp to 7× faster at 64-user concurrency — internal benchmarks, May 2026
HardwareNVIDIA accelerated computing via NVLink and InfiniBand
Query changes requiredNone — same T-SQL, eligible queries accelerated automatically
Infrastructure managementNone — fully managed by Fabric
StatusEarly Access Preview — GA targeted July 2026
ResearchCoddSpeed — ACM SIGMOD 2026 Best Industry Paper
⚠️
Early Access Preview — Verify Before Production Reliance

Not all queries are eligible for GPU execution — the engine selects eligible workloads automatically. Verify current availability and eligibility at Microsoft Learn before planning production deployments around GPU performance figures.

AI Functions in Fabric Data Warehouse — June 2026

As of June 2026, AI Functions in Fabric Data Warehouse default to gpt-5-mini with low reasoning enabled. For more complex transformations, configure gpt-5.1 and tune the reasoning_effort parameter. AI Functions support sentiment analysis, text classification, summarization, and entity extraction directly within T-SQL queries and Dataflow Gen2.

🚨
Model Retirement — Action Required

If you have pipelines pinned to gpt-4.1 (retired May 30, 2026), migrate them to gpt-5.1. If pinned to gpt-4.1-mini (retiring June 15, 2026), migrate to gpt-5-mini. Pipelines still referencing retired models will fail.

Operationalizing — Pipelines, Monitoring & CI/CD

Production data warehouses require reliable orchestration, testable deployments, and observable pipelines. Fabric provides native tooling for all three.

Orchestration with Data Pipelines

Use Data Pipelines to orchestrate Warehouse stored procedures, notebook executions, and Dataflow Gen2 runs. Configure retry policies and failure alerts per activity. Pass run_date, environment, and source system parameters into activities so the same pipeline works across dev, test, and prod workspaces.

-- Audit table: log row counts after every load
INSERT INTO dw.load_audit (table_name, load_ts, row_count, status)
SELECT
  'dw.fact_sales',
  SYSDATETIME(),
  COUNT(*),
  'Succeeded'
FROM dw.fact_sales;

CI/CD and Git Integration

Connect your Fabric workspace to a Git repository — Azure DevOps or GitHub. Warehouse DDL scripts, stored procedures, and pipeline definitions are version-controlled. Use pull request approvals and automated integration tests — row count checks, constraint validation, DAX measure assertions — before merging schema changes to production.

🆕
June 2026 — Refresh SQL Endpoint Activity Now Generally Available

The Refresh SQL Endpoint pipeline activity reached GA in the June 2026 update. Add it to your pipeline after a Lakehouse write to ensure the SQL analytics endpoint reflects the latest Delta changes before downstream queries run. This removes the need for manual endpoint refreshes or wait activities.

For step-by-step orchestration examples, see Data Pipelines in Microsoft Fabric.

Security & Governance

Fabric centralises Warehouse, Lakehouse, and Power BI on OneLake — which makes consistent identity, access control, lineage, and monitoring essential for compliance and data trust.

  1. Workspace RBACAssign Admin, Member, Contributor, or Viewer roles to control who can create or modify Warehouse items, pipelines, and notebooks.
  2. Object-level securityUse T-SQL GRANT/REVOKE to control access to specific tables, views, and stored procedures. Workspace-level access does not automatically grant access to all tables.
  3. Row-level securityApply row filters using T-SQL security predicates when different user groups must see different subsets of the same table.
  4. Column maskingMask or hash PII columns in the Warehouse, or expose only masked views to downstream consumers. Enforce at the data layer — not in Power BI measure logic.
  5. Purview integrationRegister Warehouse objects in Microsoft Purview for lineage, sensitivity labels, and DLP policies. Test DLP policies before rollout to confirm they do not block expected query patterns.
-- Read-only role for analytics team
CREATE ROLE dw_readonly;
GRANT SELECT ON SCHEMA::dw TO dw_readonly;
EXEC sp_addrolemember 'dw_readonly', 'analytics_team_group';

-- Row-level security: reps see only their region
CREATE FUNCTION dw.fn_rls_region (@region NVARCHAR(50))
RETURNS TABLE WITH SCHEMABINDING
AS RETURN SELECT 1 AS result
WHERE @region = USER_NAME() OR IS_MEMBER('dw_admin') = 1;

CREATE SECURITY POLICY dw.region_filter
ADD FILTER PREDICATE dw.fn_rls_region(region) ON dw.fact_sales
WITH (STATE = ON);

For a dedicated governance guide including Purview setup, see Microsoft Fabric Data Governance Tutorial.

Synapse Dedicated SQL Pool to Fabric — Migration Guide

T-SQL compatibility between Synapse Dedicated SQL Pools and Fabric Warehouse is high. The differences are in DDL syntax related to distribution and storage, which Fabric manages automatically.

Step 1 — Remove Synapse-Specific DDL Syntax

Remove DISTRIBUTION (HASH, ROUND_ROBIN, REPLICATE) and CLUSTERED COLUMNSTORE INDEX from all CREATE TABLE statements. Fabric’s engine manages distribution and columnar storage automatically. Also remove REBUILD, REORGANIZE INDEX, and CREATE STATISTICS calls — Fabric handles these automatically via OPTIMIZE.

-- Synapse (remove the WITH clause for Fabric)
CREATE TABLE dbo.fact_sales
(
  order_id    INT            NOT NULL,
  customer_id INT            NOT NULL,
  sale_date   DATE           NOT NULL,
  amount      DECIMAL(18,2)
)
WITH (DISTRIBUTION = HASH(customer_id),
      CLUSTERED COLUMNSTORE INDEX);  -- remove this block

-- Fabric (clean DDL)
CREATE TABLE dw.fact_sales
(
  order_id    INT            NOT NULL,
  customer_id INT            NOT NULL,
  sale_date   DATE           NOT NULL,
  amount      DECIMAL(18,2)
);

Step 2 — Data Movement Strategy

  1. Export from Synapse via CETASUse CREATE EXTERNAL TABLE AS SELECT in Synapse to export data to ADLS Gen2 as Parquet. This runs in parallel and is orders of magnitude faster than row-by-row extraction.
  2. Create a Shortcut in OneLakeIn your Fabric Lakehouse, create a Shortcut pointing to the ADLS Gen2 container where Synapse exported the Parquet files. The data appears as a table in OneLake immediately — no copy.
  3. Ingest to Warehouse with COPY INTOUse COPY INTO in Fabric Warehouse to load the Parquet data at high speed from the Shortcut. This is the fastest supported path for bulk historical data ingestion.
🔧
Fabric Migration Assistant for Data Warehouse

Microsoft provides a Migration Assistant for Data Warehouse that assesses your Synapse Dedicated SQL Pool and generates a migration readiness report. As of June 2026, the Migrate button is directly accessible from the Fabric workspace menu — no separate tool download required.

For the full feature comparison, see Microsoft Fabric vs Azure Synapse — Full Comparison & Limitations.

Common Errors & Fixes

Error / SymptomRoot CauseResolution
Error 24801 — Query CancelledCapacity throttling — burst usage exceeds F-SKU limitsCheck Fabric Capacity Metrics app. Avoid SELECT * on large tables. Smooth workloads across 24 hours or upgrade F-SKU. See capacity optimization guide.
Transaction log full during large DMLDelta log limit reached in a single massive DELETE/UPDATE transactionBatch large DML into smaller chunks — for example, delete by date partition rather than a single WHERE clause across billions of rows.
Direct Lake fallback to DirectQueryViews with unsupported T-SQL or non-deterministic functions used in Direct Lake modelMaterialize views into physical tables. Remove non-deterministic functions from objects consumed by Direct Lake. See Direct Lake fallback fix guide.
Error 110802 — OpenRowset path invalidIncorrect ABFSS path format or missing Read permissions on OneLake itemVerify path format: abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<item>/… Confirm the identity has Read permission on the Lakehouse item.
COPY INTO fails — format mismatchParquet schema mismatch between source file and target table definitionVerify column names and data types match exactly. Use FORMAT = ‘PARQUET’ and inspect source schema with a SELECT from the external file before running COPY INTO.
AI Function error after May 30, 2026Pipeline pinned to retired gpt-4.1 modelMigrate from gpt-4.1 to gpt-5.1, or from gpt-4.1-mini to gpt-5-mini. Update all hardcoded model references in Dataflow Gen2 and Warehouse AI Function calls.

FAQ — Data Warehousing in Microsoft Fabric

Data Warehousing in Microsoft Fabric is an enterprise-grade, SaaS-delivered relational warehouse built on open Delta Parquet tables in OneLake. It provides full T-SQL support including DDL/DML, stored procedures, materialized views, and multi-table ACID transactions, while sharing the same OneLake storage with Lakehouse and Power BI — no data movement required between them.
The Fabric Data Warehouse is a fully managed SQL item with complete DDL/DML, stored procedures, and ACID transactions — designed for curated star/snowflake schemas and governed BI. The Lakehouse is built for mixed and semi-structured data with Spark as the primary compute, exposing tables through an auto-generated SQL analytics endpoint. Both store data as Delta tables in OneLake and can be used together.
GPU-accelerated Fabric Data Warehouse entered early access preview in June 2026. It uses NVIDIA accelerated computing integrated directly into the Fabric query engine with no query rewrites required. Internal benchmarks from May 2026 showed up to 7x faster performance at 64-user concurrency. The underlying CoddSpeed research won Best Industry Paper at ACM SIGMOD 2026. GA is targeted for July 2026.
Yes. Fabric Data Warehouse supports T-SQL stored procedures, views, inline table-valued functions, and multi-table transactions. When migrating from Synapse, remove DISTRIBUTION and CLUSTERED COLUMNSTORE INDEX syntax — Fabric manages these automatically. Also remove manual index maintenance and CREATE STATISTICS calls.
Use CETAS in Synapse to export data to ADLS Gen2 as Parquet, create an OneLake Shortcut to that storage, then use COPY INTO in Fabric Warehouse to ingest at high speed. Remove distribution syntax and clustered columnstore definitions from DDL. Use the Fabric Migration Assistant from the workspace Migrate menu for schema assessment.
As of June 2026, AI Functions in Fabric Data Warehouse default to gpt-5-mini with low reasoning enabled. You can configure gpt-5.1 for complex transformations. AI Functions support sentiment analysis, text classification, summarization, and entity extraction in T-SQL and Dataflow Gen2. Pipelines pinned to retired gpt-4.1 must migrate to gpt-5.1 or gpt-5-mini.
All Fabric Warehouse tables are stored as Delta Parquet files in OneLake. Delta provides ACID transactions, time-travel, and efficient upserts via MERGE. The open format means the same data is readable by Spark, Power BI Direct Lake, and Fabric Data Agents — no ETL or copy between consumers.
⚠ Accuracy Disclaimer

This guide is verified against Microsoft Learn documentation, the Microsoft Fabric Community blog, and Microsoft Build 2026 announcements as of June 2026. GPU-accelerated Fabric Data Warehouse is in early access preview — features and availability will change. Always check the official documentation before production deployments. UIG Data Lab is an independent publication, not affiliated with or endorsed by Microsoft Corporation.

A.J. Data Engineering Researcher & Technical Writer · UIG Data LabAll articles →
A.J. researches and writes about data engineering, analytics architecture, Microsoft Fabric, and modern cloud data platforms. Coverage spans Microsoft Fabric, Power BI, Azure Data Engineering, Databricks, Snowflake, Apache Spark, dbt, Apache Airflow, and modern cloud data infrastructure. The focus is practitioner-level content that helps data professionals understand platform capabilities, evaluate technology decisions, optimize costs, and implement practical solutions using official documentation, product updates, community insights, and industry best practices.
Microsoft FabricData WarehouseDelta LakeT-SQLPower BIApache SparkdbtAzure SynapseOneLakeData Architecture

Scroll to Top