Data Engineering · Microsoft Fabric

Fabric Lakehouse Tutorial — Complete 2026 Guide

Architecture, Medallion design, Materialized Lake Views cross-workspace extended lineage, OneLake storage tiers, Direct Lake, security, optimization, and every June 2026 update — all verified against Microsoft Learn.

Quick Answer

A Microsoft Fabric Lakehouse stores data in OneLake using open Delta Parquet format, combining data lake flexibility with warehouse-style query performance. Every Lakehouse gets a SQL analytics endpoint (read-only T-SQL over Delta tables) and a default semantic model for Power BI at creation time — no separate setup. As of June 2026, the two biggest additions are OneLake storage tiers and lifecycle management (Preview — automatically move historical data to Cool or Cold storage to cut costs) and extended lineage for Materialized Lake Views (define one refresh schedule in your Gold Lakehouse that cascades through Silver and Bronze automatically, even across workspaces).

📅 Last verified: June 2026 ⏱ ~16 min read ✍️ A.J., Data Engineering Researcher & Technical Writer · UIG Data Lab 🔗 Source: Microsoft Learn

What Creates When You Create a Lakehouse

When you create a Fabric Lakehouse, the platform automatically provisions three connected items. Understanding this prevents the most common beginner confusion about why tables appear in some interfaces but not others.

Item CreatedWhat It IsWho Uses It
LakehouseThe storage item — Delta tables under Tables/, raw files under Files/, both in OneLakeData engineers, Spark notebooks, pipelines
SQL analytics endpointRead-only T-SQL interface over the Lakehouse’s Delta tables. Supports views, functions, column-level securitySQL developers, Azure Data Studio, Power BI Desktop
Default semantic modelAuto-generated Power BI dataset over the SQL endpoint. Refreshes when tables changePower BI report authors, Copilot
Source: learn.microsoft.com/fabric/data-engineering/lakehouse-overview
⚠️
Only Delta Tables Appear in the SQL Endpoint

Files stored under the Files section as raw CSV, JSON, or Parquet are not exposed in the SQL analytics endpoint and do not appear in the default semantic model. Only managed Delta tables (under the Tables section, or created via spark.write.format("delta").saveAsTable()) are queryable via T-SQL and available to Power BI Direct Lake. This is the most common reason a table “doesn’t show up” after ingestion.

June 2026 Updates — What Changed

The June 2026 Fabric release (published June 2, 2026) added two significant Lakehouse-specific features and several OneLake platform improvements that affect cost management and cross-workspace engineering patterns.

🔗

Extended Lineage for Materialized Lake Views June 2026

MLVs now span Lakehouse boundaries. Define views in a Gold Lakehouse that reference Silver and Bronze tables in other Lakehouses, even across workspaces. One schedule in Gold cascades through the entire lineage in dependency order automatically. Faulted nodes in the lineage graph flag missing or inaccessible dependencies before the run starts. Source: Microsoft Learn.

💾

OneLake Storage Tiers & Lifecycle Management Preview

Three tiers: Hot (standard), Cool (lower storage cost, higher transaction cost), Cold (lowest storage cost, archival). Lifecycle management policies move files between tiers automatically based on creation date, last modified date, or last access date. Configure in Workspace Settings → OneLake → Lifecycle Management. Source: Microsoft Learn.

📊

OneLake Item-Size Reporting Preview

Workspace admins can now see storage usage for every Lakehouse item — visible data, system data, and soft-deleted data — from a single page with sort, search, and on-demand refresh. Previously required Azure Storage Explorer or Capacity Metrics app estimation. Source: June 2026 Feature Summary.

🔄

Refresh Materialized Lake View — Pipeline Activity Preview

Refresh a Materialized Lake View as a native step inside a Data Factory pipeline. Coordinate view refreshes with the upstream ingestion pipelines that feed them without separate scheduling infrastructure. Source: June 2026 Feature Summary.

🔐

GitHub Enterprise Cloud with Data Residency GA

Git integration for Fabric now supports ghe.com Enterprise Cloud instances with data residency requirements — Lakehouse notebooks, pipelines, and semantic models version-controlled within specific geographic boundaries for regulatory compliance. Source: June 2026 Feature Summary.

GPU-Accelerated Fabric Data Warehouse

Not Lakehouse-specific, but relevant for architectures using Lakehouse + Warehouse together. Fabric Data Warehouse now supports GPU acceleration via NVIDIA compute integration. Internal benchmarking (May 2026) showed up to 7× faster performance versus comparable external vendors for reporting workloads. Recognised as ACM SIGMOD Best Industry Paper of 2026.

Architecture & Medallion Design

Fabric Lakehouse architecture centres on OneLake — a single, tenant-wide logical data lake backed by Azure Data Lake Storage Gen2 with a hierarchical namespace. Every workspace appears as a container, and every Lakehouse appears as a folder inside that container. Spark, T-SQL, Power BI, and pipelines all read the same underlying Parquet files without copying data between engines.

Core Components

🌊

OneLake

Single tenant-wide logical lake. Supports ADLS Gen2 APIs and SDKs — Azure Databricks, Synapse, and custom ADLS applications connect without modification. Shortcuts reference data in other Lakehouses, ADLS, S3, or Dataverse without moving it.

📋

Delta Tables

ACID transactions, schema evolution, time travel. Stored as Parquet files with Delta transaction logs. SQL endpoint exposes them as relational tables. Power BI Direct Lake reads Parquet files directly without import. V-Order write-time optimisation (enabled by default) improves read performance across all Fabric engines.

⚙️

Spark Runtime

Fabric Runtime 1.3 (Spark 3.5, Delta Lake 3.x). Starter Pools reduce cold start to 5–15 seconds. High Concurrency mode (January 2026) allows up to 5 independent Lakehouse jobs to share one Spark session — reduces wait time and cost on smaller F-SKU capacities.

Medallion Architecture in Fabric Lakehouse

Microsoft recommends organising Lakehouse data into Bronze, Silver, and Gold layers inside a single Lakehouse using Lakehouse schemas (available when Lakehouse schemas are enabled) or across multiple Lakehouses connected by OneLake shortcuts and Materialized Lake Views.

LayerPurposeFabric ToolsPractical Example
BronzeRaw ingested data. Preserve original structure for replay and audit. Minimal transformation.Data Pipelines, Copy Job, Eventstream, Data Mirroring, NotebooksRaw POS transactions written hourly from SQL Server via Copy Job CDC.
SilverCleaned, conformed, de-duplicated. Business keys applied. Data quality rules enforced.Notebooks (Delta MERGE), Materialized Lake Views (SQL or PySpark), Dataflow Gen2Orders joined to customer dimension, invalid rows quarantined to an errors table.
GoldCurated, business-ready models optimised for BI, Direct Lake, AI, and API consumption.Materialized Lake Views, SQL endpoint, Power BI Direct Lake semantic modelsSales by region, product, channel with row-level security applied at semantic model level.
ℹ️
Single Lakehouse vs. Multiple Lakehouses for Medallion

A single Lakehouse with Lakehouse schemas (bronze schema, silver schema, gold schema) works well for small-to-medium teams where all engineers share the same workspace. Multiple Lakehouses per layer — Bronze Lakehouse, Silver Lakehouse, Gold Lakehouse — works better for large teams with separate ownership boundaries, different security policies per layer, or when Gold data needs to be shared across multiple downstream workspaces via shortcuts. Extended MLV lineage (June 2026) makes the multi-Lakehouse pattern much easier to orchestrate: one schedule in the Gold Lakehouse refreshes the entire chain.

Field note — A.J., Data Engineering Researcher

The question of single Lakehouse versus multiple Lakehouses for Medallion comes down to security boundaries and team ownership, not performance. A Gold Lakehouse shared with a Reporting workspace via shortcut gives the reporting team read access to Gold tables without giving them any access to Bronze or Silver. That separation is not possible inside a single Lakehouse — workspace RBAC is at the workspace level, not the schema level. Design the Lakehouse topology around who owns what and who needs to see what, then let shortcuts and MLV lineage handle the cross-Lakehouse refresh orchestration.

Materialized Lake Views — GA & June 2026 Updates

Materialized Lake Views reached General Availability in March 2026. They let you define transformation logic in SQL or PySpark — Fabric materialises the result as a Delta table and manages incremental refresh, dependency tracking, and retry automatically. The June 2026 release added cross-workspace extended lineage, changing how multi-Lakehouse Medallion orchestration works.

What MLVs Replace

Before MLVs, a Silver-to-Gold transformation required a Spark notebook triggered by a pipeline, with manual dependency management between tables. An engineer had to know that Gold table A depends on Silver table B which depends on Bronze table C, and schedule three separate pipelines in the right order. MLVs declare these dependencies as SQL or PySpark, and Fabric resolves the execution order automatically — including across Lakehouses as of June 2026.

Key Capabilities (Post-GA, March 2026)

📐

SQL and PySpark Authoring

  • Write MLV definitions in SQL or PySpark inside a notebook
  • Requires Lakehouse schemas enabled and Fabric Runtime 1.3
  • Replace an MLV definition in place — Fabric validates the new logic, swaps it, and preserves refresh history and lineage
  • Downstream dependencies remain intact after a Replace
🔄

Incremental and Dependency-Aware Refresh

  • Partition-aware incremental refresh — only changed partitions are reprocessed
  • Dependency graph auto-generated from view definitions
  • Three refresh modes: current view only, current view + intra-Lakehouse dependencies, current view + cross-Lakehouse extended lineage
  • Multiple independent schedules per Lakehouse (not just one global schedule)
🌐

Extended Lineage — June 2026

  • MLVs now reference source tables in other Lakehouses across workspaces
  • Enable Extended Lineage toggle in Schedules pane, select upstream Lakehouses to include
  • Fabric resolves full dependency order across Lakehouses and refreshes in sequence
  • Faulted nodes flag missing or inaccessible dependencies before execution
  • One Gold Lakehouse schedule cascades Bronze → Silver → Gold automatically
🔍

Data Quality Constraints

  • Define constraints directly in MLV definitions — rows failing constraints are quarantined automatically
  • Quality metrics visible in the lineage graph per view node
  • Managed MLVs provide automatic monitoring, retries, and centralised visibility — notebook-triggered refreshes do not

Gold Sales Summary — Materialized Lake View Definition

Spark SQL — MLV definition (requires Runtime 1.3, Lakehouse schemas enabled)
-- Create a Gold-layer summary MLV
-- Source tables live in Silver schema (same or other Lakehouse via shortcut)
CREATE MATERIALIZED LAKE VIEW gold.sales_summary
AS
SELECT
    c.region,
    p.category,
    SUM(f.sales_amount)            AS total_sales,
    COUNT(*)                       AS txn_count,
    COUNT(DISTINCT f.order_id)    AS unique_orders
FROM   silver.fact_sales     f
JOIN   silver.dim_customer   c  ON f.customer_id = c.customer_id
JOIN   silver.dim_product    p  ON f.product_id  = p.product_id
GROUP BY c.region, p.category;

-- After creation: appears as a Delta table in the Lakehouse explorer
-- Queryable via SQL endpoint and Power BI Direct Lake
-- Refresh schedule: Lakehouse → Materialized lake views → Schedules
MLV vs. Notebook for Gold Layer — When to Use Which

Use Materialized Lake Views when the transformation logic is expressible in SQL or PySpark DataFrame API and you want automatic dependency tracking, incremental refresh, and quality monitoring without orchestration overhead. Use Spark notebooks (with explicit Delta MERGE patterns) when you need complex iterative logic, custom Python libraries, ML feature generation, or CDC patterns that require row-level upsert control. Both output Delta tables consumable by the SQL endpoint and Direct Lake.

Loading Data into Fabric Lakehouse

Fabric supports multiple ingestion paths into Lakehouse, all landing data in OneLake. The right method depends on source type, latency requirement, and whether the team prefers code-first or low-code tooling.

Data Pipelines — Copy Activity and Copy Job

Data Pipelines connect to 90+ sources and land data into Lakehouse tables or Files. Copy Activity gives full control over schema mapping, partition options, and fault tolerance. Copy Job Activity (GA April 2026) is the no-code equivalent with automatic schema detection, bulk/incremental/CDC in one item, and SCD Type 2 support for Fabric Warehouse. CDC for SQL Server, Azure SQL, and Azure SQL MI is now GA; auto-partition support for Oracle, Lakehouse, and SAP HANA is in Preview.

PySpark — write raw Parquet landing to Delta Bronze table
# Read raw files from ADLS landing zone
df = (spark.read
      .format("parquet")
      .load("abfss://landing@storage.dfs.core.windows.net/raw/sales/"))

# Write as managed Delta table in Bronze schema
(df.write
   .format("delta")
   .mode("overwrite")
   .option("overwriteSchema", "true")
   .saveAsTable("bronze.sales_raw"))

Dataflow Gen2 — Low-Code Ingestion

Dataflow Gen2 uses Power Query with 200+ connectors to load and transform data into Lakehouse tables — no Spark knowledge required. The June 2026 release added Mapping Data Flow transformations inside Dataflow Gen2, bringing ADF Mapping Data Flow logic into the Fabric low-code experience. Output directly to Lakehouse managed tables, queryable immediately via the SQL endpoint and Power BI.

Spark Notebooks — Complex ETL and CDC

Notebooks are the right tool for complex ingestion: CDC with Delta MERGE, custom schema enforcement, data quality gates, feature engineering, and multi-source joins. Use mssparkutils.runtime.getParameterValue() to receive pipeline parameters so notebooks are parameterised and rerunnable rather than hardcoded.

PySpark — Delta MERGE upsert (idempotent CDC pattern)
from delta.tables import DeltaTable
from pyspark.sql.functions import current_timestamp, lit

RUN_DATE = mssparkutils.runtime.getParameterValue("run_date", "2026-06-06")

# Read incremental source data
df_inc = (spark.read.format("delta")
          .table("bronze.sales_raw")
          .filter(col("load_date") == lit(RUN_DATE))
          .dropDuplicates(["order_id"]))

# MERGE to Silver table — safe to rerun for the same run_date
target = DeltaTable.forName(spark, "silver.fact_sales")
(target.alias("tgt")
       .merge(df_inc.alias("src"), "tgt.order_id = src.order_id")
       .whenMatchedUpdateAll()
       .whenNotMatchedInsertAll()
       .execute())

Eventstream — Real-Time Ingestion

Fabric Eventstream ingests streaming data from Azure Event Hubs, IoT Hub, Kafka, MQTT, and CDC sources. Route streams to Lakehouse Delta tables, Eventhouse KQL databases, or both simultaneously. June 2026 added GA support for Apache Kafka and Azure Service Bus connectors, pagination support for the HTTP connector (Preview), and Eventstream observability through Workspace Monitoring (Preview).

Mirroring and OneLake Shortcuts

Data Mirroring keeps operational databases (Azure SQL DB, Cosmos DB, Snowflake, Databricks Unity Catalog) in sync as Delta tables in OneLake without custom ETL — near-real-time replication with zero code. OneLake shortcuts virtualise data from ADLS Gen2, S3, or other Lakehouses as folders inside your Lakehouse without copying it. June 2026 added network security controls for Mirroring — workspace-level outbound access protection for mirrored connections.

SQL Endpoint, Direct Lake & Power BI

The SQL analytics endpoint is auto-generated at Lakehouse creation and exposes every managed Delta table as a read-only T-SQL relation. Connect with Power BI Desktop, Azure Data Studio, or any T-SQL client using Entra ID authentication against the Fabric connection endpoint.

T-SQL — query via SQL analytics endpoint
SELECT
    product_name,
    SUM(revenue)    AS total_revenue,
    COUNT(order_id) AS order_count
FROM  gold.sales_summary
GROUP BY product_name
ORDER BY total_revenue DESC;

June 2026 — SQL Endpoint Updates

The June 2026 release added several SQL analytics endpoint improvements: CI/CD Support for SQL analytics endpoint (Preview) — version-control and promote SQL endpoint objects across environments; Enhanced metadata sync (Preview) — improved consistency between Lakehouse table changes and SQL endpoint schema; Time Travel via SQL analytics endpoint (Preview) — query historical Delta table versions using SQL syntax; and Configurable Retention (Preview) for controlling Delta history depth.

Power BI Direct Lake Mode

Direct Lake allows the Power BI Analysis Services engine to read Delta Parquet files directly from OneLake memory — no import job, no scheduled refresh window. Reports reflect data as soon as the upstream ETL or MLV refresh writes new files to the Gold Delta table.

🚨
Direct Lake Fallback — Four Root Causes
  • Unsupported DAX functions — some functions force a fallback; check the Direct Lake function compatibility list on Microsoft Learn
  • OneSecurity enabled on the Lakehouse — Direct Lake does not support Lakehouses with OneLake data access roles (OneSecurity preview) enabled; manage RLS at the semantic model level instead
  • Model exceeds SKU memory framing — Direct Lake frames column data into memory; if the model exceeds the F-SKU framing limit, it falls back to DirectQuery for that query
  • Unsupported relationship cardinality or column types — binary columns, certain calculated columns, and many-to-many relationships without bridge tables can trigger fallback

Diagnose via Power BI Desktop Performance Analyzer — look for DirectQuery storage engine events. See the Direct Lake fallback fix guide for resolution steps.

Security & Governance

Fabric Lakehouse security operates at three layers: workspace RBAC, item-level permissions, and data-plane controls. All identity is anchored to Microsoft Entra ID — no separate credential store.

LayerMechanismPractical Use
Workspace RBACAdmin, Member, Contributor, Viewer rolesEngineers are Members. Analysts are Viewers — they can query the SQL endpoint but cannot edit notebooks or pipelines.
Item permissionsReadAll, ReadData, Build permissions per Lakehouse itemGrant ReadData to a reporting service principal without exposing the full workspace.
SQL endpoint RLSRow-level security defined in the SQL endpoint or semantic modelRegional managers see only their region’s rows in Gold tables.
Column maskingDynamic data masking in SQL analytics endpoint (GA)Mask PII columns from non-admin roles without changing ETL output.
Sensitivity labelsMicrosoft Purview Information Protection integrationLabels applied in M365 inherit automatically to Fabric Lakehouse items and reports.
Customer-managed keysGA — OneLake data encrypted with customer-owned keys in Azure Key VaultRequired for some regulated industries. Extended in 2026 to support Key Vaults behind a firewall (Preview).
⚠️
OneSecurity (Preview) and Direct Lake — Known Incompatibility

OneLake data access roles (OneSecurity preview) — which enforce table, column, and row-level permissions at the OneLake storage level — are not compatible with Direct Lake semantic models on the same Lakehouse as of June 2026. Enabling OneSecurity on a Lakehouse causes Direct Lake models to fail with an access error and fall back to DirectQuery. Manage RLS at the Power BI semantic model level for Lakehouses serving Direct Lake until this limitation is resolved.

OneLake Shortcuts — Permission Requirements

When creating shortcuts to external ADLS Gen2, S3, or cross-workspace Lakehouses, the identity accessing the shortcut (your Entra ID in interactive sessions, the workspace identity or service principal in pipeline runs) must have at least Storage Blob Data Reader on the target ADLS account, or equivalent permissions on the target Lakehouse workspace. Shortcut authorization failures are almost always a permission mismatch between the accessing identity and the target storage. See the OneLake shortcut authorization fix guide for the full diagnostic checklist.

Governance and Lineage

Microsoft Purview captures lineage from sources through Lakehouse ingestion pipelines, MLVs, SQL endpoint views, semantic models, and Power BI reports. As of 2026, cross-workspace lineage for non-Power BI Fabric assets has a known limitation in Purview — MLV extended lineage (June 2026) addresses the orchestration side of this by making refresh dependency order visible inside Fabric itself, while the Purview cross-workspace lineage gap remains on the roadmap.

Optimization, Maintenance & Cost Control

Performance and cost in a Fabric Lakehouse depend on Delta file layout, partitioning strategy, query patterns, and OneLake storage tier decisions. The June 2026 storage tiers feature adds a new cost lever that did not exist previously.

OneLake Storage Tiers — Cost Strategy

All OneLake data defaults to the Hot tier. The three tiers have different cost trade-offs: Hot has the lowest transaction cost and is correct for Delta tables actively queried by Spark, SQL endpoint, or Direct Lake. Cool has lower storage cost but higher transaction cost — suitable for historical partitions accessed monthly or less. Cold has the lowest storage cost with the highest transaction cost and retrieval charges — suitable for compliance archives accessed less than once per year.

Lifecycle management — configure in Workspace Settings → OneLake → Lifecycle Management
# Example lifecycle rule (configured in Fabric UI or via API, not T-SQL)
# Rule: Move files not accessed in 90 days to Cool tier
Rule name:  move_old_bronze_to_cool
Scope:      /workspaces/{workspace_id}/lakehouses/{lakehouse_id}/Tables/bronze.*/
Action:     ChangeTier → Cool
Condition:  LastAccessedDaysAgo >= 90

# Rule: Move files not accessed in 365 days to Cold tier
Rule name:  move_archive_to_cold
Scope:      /workspaces/{workspace_id}/lakehouses/{lakehouse_id}/Tables/bronze.*/
Action:     ChangeTier → Cold
Condition:  LastAccessedDaysAgo >= 365

# NOTE: Delta transaction log files should remain Hot —
# exclude _delta_log/ paths from cooling rules
⚠️
Storage Tier and Delta Transaction Logs

Delta transaction log files (_delta_log/) are read on every table open operation. Moving Delta log files to Cool or Cold tier significantly increases the transaction cost of every Spark job and SQL query on that table, because log reads are charged at the higher tier rate. Scope lifecycle rules to data Parquet files only, not to the _delta_log/ directory. Hot tier for transaction logs is always correct — they are small and accessed constantly.

Table Maintenance — OPTIMIZE and VACUUM

Incremental writes create many small Parquet files over time. OPTIMIZE compacts them into larger files — Power BI Direct Lake performance depends on this. VACUUM removes old Delta file versions outside the retention window. As of March 2026, the Lakehouse Maintenance Activity in Data Factory pipelines (Preview) runs VACUUM and OPTIMIZE as a native pipeline step without a separate notebook.

Spark SQL — table maintenance (also available via Lakehouse Maintenance Activity)
-- Compact small files and co-locate data for common filter patterns
OPTIMIZE gold.sales_summary
ZORDER BY (region, category);

-- Remove Delta versions older than 7 days
-- Run during maintenance windows, not while active queries run
VACUUM gold.sales_summary RETAIN 168 HOURS;

-- Check current table health
DESCRIBE DETAIL gold.sales_summary;

Partition Strategy

Partition large fact tables by date or high-cardinality business keys so Spark and SQL endpoint queries only scan relevant partitions. Target 256 MB–1 GB per partition file after OPTIMIZE runs. Too many small partitions (under 128 MB) hurt metadata reads; too few large partitions (over 4 GB) hurt parallelism. For time-series tables, partition by year+month rather than day to avoid thousands of partitions accumulating over years.

Field note — A.J., Data Engineering Researcher

The OneLake storage tiers feature (June 2026) is the first cost lever in Fabric that does not require changing pipeline or query logic. For organisations retaining years of Bronze data for compliance, moving Bronze partitions older than 90 days to Cool and older than 365 days to Cold can cut OneLake storage costs by 40–60% on those partitions, with no impact on active Silver and Gold layer performance. The key discipline is excluding Delta transaction logs from any cooling policy — that single mistake turns a cost optimisation into a performance problem.

Real-World Use Cases

Retail Analytics — Real-Time KPIs

POS data arrives via Data Pipelines Copy Job (CDC from SQL Server) into Bronze. Online orders stream through Eventstream into a separate Bronze table. Product catalogue comes from a Warehouse shortcut — no data movement. Materialized Lake Views normalise and join Bronze tables into a Silver customer-order fact. A Gold MLV aggregates sales by region, product, and channel. Power BI Direct Lake serves the Gold table — report latency matches the Eventstream ingestion rate, not a scheduled refresh window.

Finance — Power BI Premium to Fabric Lakehouse Migration

Legacy Power BI Premium datasets using Import mode are migrated to Fabric Lakehouse-backed Direct Lake semantic models via the Power BI Premium to Fabric migration path. Scheduled import refreshes are replaced by pipeline-triggered MLV refreshes — report data is as fresh as the last ETL run, not the last scheduled import. RLS moves from Power BI dataset-level to semantic model-level with Entra ID group mapping.

AI and RAG Pipelines on Fabric Lakehouse

Documents, embeddings, and structured reference data stored in a Fabric Lakehouse are accessible to Azure OpenAI via native OneLake integration — no custom ETL pipeline required to get enterprise data into the AI context. Fabric Data Agents (GA in Microsoft 365 Copilot, June 2026) can query Lakehouse tables directly using natural language, with the NL2SQL engine (Improved, June 2026 Preview) translating questions to T-SQL against the SQL analytics endpoint.

Manufacturing and IoT

Factory telemetry streams from IoT Hub through Eventstream into both a Lakehouse Bronze table and an Eventhouse KQL database simultaneously. The KQL database serves sub-second real-time dashboards with time-series visualisation (Preview, June 2026). The Lakehouse Bronze table feeds Silver and Gold MLVs for historical analysis. Predictive maintenance models trained in Fabric Data Science notebooks read from the Gold Lakehouse and write predictions back as a new Delta table, also queryable via Direct Lake.

Frequently Asked Questions – Fabric Lakehouse Tutorial

Fabric Lakehouse stores both structured and unstructured data in open Delta format with multi-engine access via Spark, SQL, and Power BI Direct Lake. It is the correct home for raw ingestion, data science workloads, feature engineering, and multi-format data. Data Warehouse provides a fully transactional T-SQL engine with DML (INSERT, UPDATE, DELETE), stored procedures, cross-database queries, and workload isolation — better for governed relational workloads and financial models that need full ACID write access via SQL. Most enterprise architectures use both: Lakehouse for the engineering and data science layers, Warehouse for curated SQL-heavy consumption layers that benefit from transactional control.
Only managed Delta tables (under the Tables section, created via spark.write.format("delta").saveAsTable() or the Lakehouse table creation UI) appear in the SQL analytics endpoint. Files stored as raw CSV, JSON, or Parquet under the Files section are not exposed. The fix is always the same: write or convert the data to a Delta table using saveAsTable(). If you used .save("Tables/my_table") without saveAsTable(), the files exist in OneLake but are not registered as a managed table — the SQL endpoint cannot see them. Register them by running spark.catalog.createTable("my_table", path="Tables/my_table", source="delta").
OneLake storage tiers (Preview, June 2026) give three options: Hot — standard pricing, correct for active Delta tables queried regularly by Spark, SQL endpoint, or Direct Lake. Cool — lower storage cost, higher transaction cost, suitable for historical partitions accessed monthly or less. Cold — lowest storage cost, highest transaction and retrieval cost, correct for compliance archives accessed less than once per year. Lifecycle management policies move files between tiers automatically based on creation, last modified, or last accessed date — configure in Workspace Settings → OneLake → Lifecycle Management. The critical rule: never move Delta transaction log files (_delta_log/) to Cool or Cold — they are read on every table open and the higher transaction cost offsets any storage saving.
Extended Lineage for MLVs (June 2026) enables a Gold Lakehouse to define one refresh schedule that cascades through Silver and Bronze Lakehouses, even across different workspaces. Enable the Extended Lineage toggle in the Schedules pane and select which upstream Lakehouses to include. Fabric resolves the full dependency order — Bronze refreshes first, then Silver, then Gold — without needing separate schedules in each Lakehouse. The lineage graph shows every upstream view, source table, and shortcut with name, Lakehouse, and type. Faulted nodes (red) flag missing or inaccessible dependencies before the run executes, so you know about broken dependencies before they cause a failed Gold refresh. Three refresh modes: current view only, current Lakehouse dependencies, or full cross-Lakehouse extended lineage. Source: Microsoft Learn — Schedule a Materialized Lake View Refresh.
Diagnose first — run Performance Analyzer in Power BI Desktop and look for DirectQuery storage engine events in the log. Four root causes: (1) Unsupported DAX functions — check the Direct Lake DAX compatibility list on Microsoft Learn and rewrite or remove unsupported functions; (2) OneSecurity enabled on the Lakehouse — OneLake data access roles (Preview) block Direct Lake; manage RLS at the semantic model level instead; (3) Model exceeds F-SKU framing limit — reduce model size by removing unused columns, measures, or split into multiple semantic models; (4) Unsupported column types or relationship cardinality — binary columns and certain many-to-many relationships without a bridge table trigger fallback. The Direct Lake fallback fix guide covers all four cases with resolution steps.
⚠ Accuracy Disclaimer

Feature descriptions are based on official Microsoft Learn documentation and the Fabric June 2026 Feature Summary (published June 2, 2026). Preview features are subject to change. Verify current feature status at learn.microsoft.com/fabric/data-engineering/lakehouse-overview and the Fabric Updates Blog. UIG Data Lab is an independent publication, not affiliated with or endorsed by Microsoft Corporation.

A.J. Data Engineering Researcher & Technical Writer · UIG Data Lab
A.J. researches and writes about data engineering, analytics architecture, Microsoft Fabric, and modern cloud data platforms. Coverage spans Microsoft Fabric, Power BI, Azure Data Engineering, Databricks, Snowflake, Apache Spark, dbt, Apache Airflow, and modern cloud data infrastructure. The focus is practitioner-level content that helps data professionals understand platform capabilities, evaluate technology decisions, optimise costs, and implement practical solutions using official documentation, product updates, community insights, and industry best practices. His writing covers real decisions from real deployments — not documentation rewrites.
Microsoft Fabric Lakehouse OneLake Delta Lake Direct Lake Materialized Lake Views Power BI Apache Spark Medallion Architecture Azure

Scroll to Top