Data Mirroring in Fabric – Complete 2026 Guide

Complete 2026 guide to real-time data synchronization, CDC implementation, and production-grade mirroring patterns for Microsoft Fabric.

Updated January 2026

What is Data Mirroring in Fabric?

Data Mirroring in Fabric is a zero-ETL, real-time replication service that continuously synchronizes data from sources like Azure SQL, Snowflake, and PostgreSQL into OneLake. It automates Change Data Capture (CDC) to provide fresh analytics data without building complex pipelines.

💡 Thinking about starting a small side income online?

Many creators start with simple tools and workflows — no investment required.

See how creators do it → CreatorOpsMatrix.com

Optimize Your Fabric Career & Income

Managing Fabric capacity and real-time pipelines requires expertise. Make sure your compensation matches your technical value. Audit your salary and calculate your Fabric ROI.

What is Data Mirroring in Fabric?

Data Mirroring in Fabric is a near-real-time, continuous replication capability that synchronizes transactional data from enterprise sources directly into Microsoft Fabric’s OneLake. Consequently, analytics teams gain immediate access to fresh operational data without building complex ETL pipelines.

Fundamentally, mirroring operates in two phases:

  • Initial Snapshot: Captures a consistent baseline from the source database
  • Continuous CDC (Change Data Capture): Streams inserts, updates, and deletes in near real-time

Why Mirroring Matters

Eliminates months of ETL development. Maintains data freshness. Preserves transaction order semantics. Enables real-time analytics without operational overhead—critical for financial reporting, inventory tracking, and anomaly detection.

Key Capabilities (January 2026)

Sub-Minute Latency

Changes from source typically appear in Fabric within 30–60 seconds, enabling near-real-time dashboards and alerting.

Managed Infrastructure

Fabric handles snapshots, CDC configuration, and data delivery. No custom CDC code required.

Multi-Source Support

Azure SQL, PostgreSQL, MySQL, Kafka, on-premises databases, and custom sources via Open Mirroring.

Delta Lake Integration

Data lands in Delta Lake automatically, enabling ACID guarantees and seamless integration with notebooks and SQL Warehouse.

Supported Sources & Architecture

Data Mirroring supports 15+ enterprise sources and custom systems. Choose the approach that matches your infrastructure:

Native Connectors (Managed by Microsoft)

  • Azure SQL Database & Managed Instance — Native CDC via transaction log with sys.dm_change_tracking
  • Azure Database for PostgreSQL — Logical replication via WAL (Write-Ahead Log)
  • Azure Database for MySQL — Binlog-based CDC with continuous streaming
  • SQL Server (on-premises) — Via data gateway with SQL Server Agent CDC jobs
  • Azure Synapse Dedicated SQL Pools — Dedicated connector for warehouse syncing

Open Mirroring (Custom & Multi-Cloud)

For systems not covered by native connectors, Open Mirroring provides a flexible API and SDK for building custom adapters:

  • Amazon RDS (PostgreSQL, MySQL) — Via open mirroring with Fabric-provided SDK
  • Google Cloud SQL — Cross-cloud replication using open mirroring adapters
  • Apache Kafka & Event Streams — Ingest event logs and transform into mirrored tables
  • Custom & Legacy Systems — JDBC drivers, ETL tools, and API-based adapters
Check Microsoft Learn: The official Mirroring documentation lists the full current set of supported sources. Capabilities expand quarterly, so verify your source’s latest status before design.

Architecture Overview

┌─────────────────────┐ │ Source Database │ (Azure SQL, PostgreSQL, Kafka, on-prem) │ (Transaction Log) │ └──────────┬──────────┘ │ CDC / Events / Log Stream │ ┌──────────▼──────────────────────────────────┐ │ Fabric Mirroring Ingestion Engine │ │ • Snapshot capture │ │ • Change stream processing │ │ • Transaction ordering │ └──────────┬──────────────────────────────────┘ │ │ (30–60 sec latency) │ ┌──────────▼──────────────────────────────────┐ │ OneLake / Delta Lake │ │ mirrored_db.orders (Delta table) │ │ mirrored_db.customers (Delta table) │ └──────────┬──────────────────────────────────┘ │ ┌─────┴─────┬──────────┬────────────┐ │ │ │ │ ▼ ▼ ▼ ▼ ▼ SQL Notebook Warehouse Power BI Python/Spark

Types of Mirroring & Use Cases

Choose the mirroring mode that aligns with your latency, control, and complexity requirements:

1. Source-Native Mirroring (Recommended for Cloud Databases)

Best for: Azure-hosted databases with native CDC support (Azure SQL, PostgreSQL, MySQL).

  • Managed end-to-end by Fabric
  • Automatic snapshot + CDC scheduling
  • Minimal configuration overhead
  • Sub-minute latency (typically 30–60 seconds)

2. Open Mirroring (Flexible & Extensible)

Best for: Custom sources, legacy systems, multi-cloud, or Kafka streams.

  • Custom adapter development using Fabric SDK
  • Full control over transformation logic
  • Support for any CDC mechanism (logs, APIs, webhooks)
  • Suitable for complex business logic

3. Metadata Mirroring (Schema & Governance)

Best for: Synchronizing schema changes and metadata across Fabric workspace.

  • Schema auto-discovery and registration
  • Data lineage and impact analysis
  • Governance catalog population
Use Case: Financial services firms mirror trading databases every 15 seconds to fuel real-time P&L dashboards, risk models, and regulatory reporting—all without maintaining custom ETL code.

Setup & Prerequisites

Prerequisites Checklist

Before You Start: Verify all prerequisites are met to avoid setup failures and data loss.
  • Required Fabric Capacity: F64 or higher (P1+ equivalent) with available compute.
  • Required Source Connectivity: Network access from Fabric to source (public IP or VNet peering).
  • Required CDC Enabled on Source: Enable transaction log / WAL / binlog CDC on your database.
  • Required Credentials & Permissions: Service account with read + CDC permissions on source.
  • Recommended Data Gateway: For on-premises sources, install and configure the Fabric data gateway.
  • Recommended Schema Validation: Document source schema and expected column types.

Step-by-Step Setup

1

Create Lakehouse & Destination Database

In Fabric, create a new Lakehouse. This will host your mirrored tables. Plan your schema upfront: one table per source entity.

2

Enable CDC on Source

Azure SQL: ALTER DATABASE [YourDB] SET CHANGE_TRACKING = ON

PostgreSQL: Enable logical replication in postgresql.conf

MySQL: Set binlog_format = ‘ROW’ in my.cnf

3

Configure Mirroring Connection

In Fabric, open the Lakehouse and select “New Mirrored Table”. Choose your source connector (native or open mirroring). Enter credentials and connection string. Test connectivity.

4

Select Tables & Start Snapshot

Choose which tables to mirror. Start with small tables (< 100GB) to validate. Monitor snapshot progress in the Mirroring Hub.

5

Validate & Enable CDC Streaming

Once snapshot completes, enable CDC streaming. Observe row counts and latency. Create alerts for replication lag.

# Example: Enable CDC on Azure SQL ALTER DATABASE SalesDB SET CHANGE_TRACKING = ON; ALTER TABLE dbo.Orders ENABLE CHANGE_TRACKING; ALTER TABLE dbo.Customers ENABLE CHANGE_TRACKING;— Grant minimal permissions to service account GRANT CONTROL ON DATABASE::SalesDB TO [fabric-sa];

How Mirroring Works Under the Hood

Phase 1: Initial Snapshot

The snapshot phase establishes a consistent baseline by reading source tables at a specific point in time. Importantly, this captures the schema and all current rows. Fabric then writes them to Delta tables in OneLake with a _timestamp column marking the snapshot time.

Phase 2: Continuous CDC

After snapshot completion, Fabric continuously reads the source’s change log (transaction log, WAL, binlog) and applies changes in strict order. Each change is idempotent: inserts become UPSERT operations, deletes are marked or removed, and updates merge with existing rows.

Timeline of events: T0: Snapshot started (initial read) T1–T100: Snapshot rows written to OneLake (4.2M rows, 8.5 GB) T101: Snapshot complete, CDC begins T102–∞: Changes stream in (avg 50–200 changes/sec)Result: By T105, mirrored table is “caught up” within 4 seconds

Type Mapping & Schema Handling

Fabric automatically maps source column types to Delta Lake equivalents. However, some mappings are ambiguous (e.g., SQL Server NVARCHAR(MAX) → Spark STRING). Handle these via:

  • Explicit casting in mirroring config (if supported)
  • Post-load notebooks that cast columns to desired types
  • Custom adapters (for open mirroring)

Handling Schema Changes

When the source schema changes (new column, dropped column, type change), mirroring behavior depends on configuration:

  • Add Column: New column appears in mirrored table with nulls for historical rows
  • Drop Column: Column retained in mirrored table (data not deleted) for backward compatibility
  • Rename Column: Requires manual intervention; coordinate with governance and consumers
  • Type Change: May cause validation errors; route to anomalies table for review
Schema Change Coordination: Always notify data engineering and analytics teams before modifying source schema. Plan mirroring pauses or manual backfills if necessary.

Design Patterns & Best Practices

Pattern 1: Landing → Transform → Serve

Separate raw mirrored data (landing) from curated business tables (serving). This improves traceability, error handling, and allows safe reruns.

Landing Layer (raw mirrored tables): mirrored_db.raw_orders (from source, includes all columns) mirrored_db.raw_customers (from source)Transform Layer (notebooks): 1. Validate: check for nulls, duplicates, range violations 2. Enrich: join with reference data, add calculated columns 3. Merge: upsert into curated tablesServing Layer (output): curated.orders (validated, deduplicated, enriched) curated.dim_customer (slowly changing dimension)

Pattern 2: Idempotent Delta Merge

Use Delta MERGE to ensure downstream inserts/updates are idempotent (safe to replay):

MERGE INTO curated.orders AS target USING staging.orders_validated AS source ON target.order_id = source.order_id WHEN MATCHED AND source._timestamp > target._updated_ts THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *;

Pattern 3: Late Arriving Facts

Mirrored data may occasionally arrive out of order or late. Handle this with:

  • _source_timestamp: Capture source transaction time (not ingestion time)
  • _batch_id: Link changes to snapshot or CDC batch for reconciliation
  • _confidence_level: Flag rows arriving after a threshold (e.g., > 5 min late)

Pattern 4: Anomaly Detection & DLQ

Route data quality issues to a Dead Letter Queue (DLQ) for manual review:

IF validation_errors > 0: WRITE to quarantine.orders_dlq SEND alert to data-quality team STOP processing until manual review ELSE: PROCEED with merge to curated tables

Best Practice: Freshness SLAs

Define and monitor replication lag SLAs:

  • Critical tables: < 1 minute lag (e.g., orders, payments)
  • Important tables: < 5 minute lag (e.g., customers, products)
  • Reference tables: < 1 hour lag (e.g., regions, categories)
Monitoring Query: Regularly run SELECT MAX(_source_timestamp) FROM mirrored_db.orders WHERE _source_timestamp > NOW() – INTERVAL ‘1 minute’ to validate SLA compliance.

Operational Monitoring & Troubleshooting

Key Metrics to Track

MetricHealthy RangeAction if Breached
Replication Lag30–90 secondsInvestigate network latency, CDC backlog, Fabric capacity
Snapshot DurationDepends on table size (1–100 GB = 5–60 min)If > expected, check source locks and network throughput
Failed Transactions0 (per hour)Review error logs; restart connector if transient
Row Count Match (source vs sink)100% (after CDC catch-up)Run reconciliation query; identify missing/duplicate rows
Capacity CU Usage< 80% peak during ingestionThrottle snapshots, schedule during off-peak, scale capacity

Monitoring Setup (Fabric Monitor Hub)

  • Navigate to Workspace Settings → Mirroring → Monitor
  • View per-table status, last snapshot date, CDC lag
  • Set alerts for replication lag > 5 minutes, failed CDC batches
  • Export metrics to Power BI for trending

Reconciliation Query

— Compare source and mirrored row counts SELECT ‘source’ AS location, COUNT(*) AS row_count FROM source_database.dbo.orders UNION ALL SELECT ‘mirrored’, COUNT(*) FROM mirrored_db.orders;— Find missing rows SELECT order_id FROM source_database.dbo.orders EXCEPT SELECT order_id FROM mirrored_db.orders;

Common Issues & Fixes

Issue: Replication lag increases steadily.
Fix: Scale Fabric capacity, check for source locks, reduce CDC batch size.
Issue: Connector restarts frequently.
Fix: Review error logs for transient vs. persistent errors; add retry logic or increase timeout.
Issue: Row counts don’t match source after CDC is “caught up”.
Fix: Check for duplicate key violations, deleted rows not being marked, or filter conditions in mirroring config.

Security, Governance & Lineage

Access Control (Least Privilege)

  • Mirroring Setup: Only workspace admins or data engineers with mirroring permissions
  • Read Access: Analysts and BI teams: Reader access to curated tables only
  • Raw Mirrored Tables: Restrict to data engineering; prevent direct analytics queries
  • Credentials: Use Azure Key Vault service principals; never hardcode passwords

Data Masking & Sanitization

For PII (Personally Identifiable Information), apply masking in transformation notebooks:

— Mask SSN and email in curated customer table SELECT customer_id, customer_name, CONCAT(‘***-**-‘, RIGHT(ssn, 4)) AS ssn_masked, CONCAT(SUBSTRING(email, 1, 3), ‘***@’, SPLIT(email, ‘@’)[1]) AS email_masked, region FROM mirrored_db.raw_customers;

Lineage & Data Catalog Registration

Register mirrored datasets in the Fabric Data Catalog:

  • Business Glossary: Tag columns with business terms (e.g., “Order Date” = order_ts)
  • Impact Analysis: Track which dashboards and reports consume mirrored data
  • Change Notifications: Alert downstream users when schema changes
  • Data Quality Rules: Register freshness, completeness, and accuracy SLAs

Audit Logging

Fabric automatically logs mirroring activities:

  • Snapshot start/completion
  • CDC streaming status changes
  • Failed transactions and errors
  • Access to mirrored tables (via query logs)
Governance Best Practice: Establish a data stewardship committee that reviews new mirrors quarterly and certifies them for production use.

Practical Recipes & End-to-End Examples

Recipe 1: Real-Time Sales Dashboard

Objective: Power a live P&L dashboard updated every 2 minutes1. Mirror orders, line_items, customers from Azure SQL every 30 sec 2. Notebook (scheduled every 2 min): – Join mirrored.orders → mirrored.customers – Calculate revenue, margin by region, product – Write to curated.sales_summary 3. Power BI connects to curated.sales_summary 4. Dashboard auto-refreshes every 2 minutes 5. Alert: If orders < 100 in last hour, notify sales teamSLA: < 2.5 min latency (30 sec mirror lag + 2 min transform + 0.5 min BI refresh)

Recipe 2: ML Feature Store from Transactional Data

Objective: Continuously update customer features for churn prediction model1. Mirror customers, orders, returns from source 2. Feature Engineering Notebook (runs every 6 hours): – Aggregate orders in last 30, 90, 365 days – Calculate purchase frequency, avg order value, return rate – Join with customer demographics – Write to features.customer_churn_v1 3. ML Pipeline: – Read features.customer_churn_v1 – Score customers with trained model – Write predictions to curated.churn_scores 4. App queries curated.churn_scores for real-time intervention listBenefit: Model always operates on fresh (< 6 hour old) features

Recipe 3: Handling Late-Arriving Facts

Objective: Process late-arriving orders (orders arriving > 5 min after creation)1. Mirrored orders include _source_timestamp (creation time in source) 2. Validation Notebook: – Calculate lag = NOW() – _source_timestamp – IF lag > 5 min, write to quarantine.late_orders – ELSE, proceed to merge 3. Late Orders Processing: – Each hour, review quarantine table – Apply business rules (reorder? refund? flag for investigation) – Merge into curated.orders_adjusted with late_flag = TRUE 4. Power BI filters: – Regular dashboard: uses curated.orders (no late-arrives) – Late-arrives dashboard: shows quarantine + late-flagged ordersBenefit: Prevents inflated daily totals while preserving data quality

Recipe 4: Multi-Source Consolidation

Objective: Consolidate product data from 3 regional databases1. Mirror products from: – US-SQL → mirrored_db.products_us – EU-PostgreSQL → mirrored_db.products_eu – APAC-MySQL → mirrored_db.products_apac 2. Consolidation Notebook: – Union all three mirrored tables – Deduplicate by product_code (RANK by _source_timestamp DESC) – Add region_source column – Write to curated.products_global (single source of truth) 3. Power BI connects to curated.products_globalBenefit: Single consolidated view; easy to identify regional conflicts

Frequently Asked Questions – Data Mirroring in Fabric

Does mirroring support my on-premises database?

Yes, via the Fabric data gateway. Install and register the gateway in your on-premises network. Mirroring will connect through it securely. Confirm your DBMS version is compatible with the gateway’s CDC capabilities (SQL Server 2016+, PostgreSQL 10+, MySQL 5.7+).

What latency should I expect?

Sub-minute for healthy setups (30–90 sec). Snapshot duration depends on table size: 1 GB ≈ 1–2 min, 100 GB ≈ 30–60 min. After snapshot, CDC latency is typically 30–60 sec depending on network and source transaction volume.

Can I mirror to multiple workspaces?

Not directly. Each mirroring connection targets a single Fabric workspace. Workaround: Set up multiple mirroring connections in different workspaces pointing to the same source, or use OneLake shortcuts to share mirrored data across workspaces.

What happens when the source schema changes?

Mirroring adapts automatically for additions and type-safe changes. For breaking changes (column drop, type incompatibility), manually update the mirroring configuration and restart CDC. Coordinate with downstream consumers (dashboards, notebooks) before applying schema changes.

How do I backfill historical data?

Mirroring always starts with a full snapshot. If you need data before the snapshot, either: (a) restore a backup on source and re-snapshot, or (b) use a separate historical data load tool (e.g., Copy Activity or Dataflow Gen2) to fill in older periods.

Can I filter rows during mirroring?

Native mirroring doesn’t support WHERE clauses. If you need filtering, apply it post-load in a transformation notebook. Alternatively, use Open Mirroring with custom adapters to filter at ingestion.

What’s included in the Fabric price for mirroring?

Mirroring capacity usage is charged as part of your Fabric capacity bill. There’s no separate per-GB ingestion fee. Use the Fabric Pricing Calculator to estimate your costs.

How does mirroring compare to Data Pipelines + Copy Activity?

Mirroring is simpler and more automated (no custom CDC logic). Copy Activity is more flexible for complex transformations. Ideal: Mirroring for raw replication, Copy Activity for selective loads and transformations.

Ready to Enable Data Mirroring?

Start with a low-risk source (< 100 GB), validate replication lag and data quality, then scale to mission-critical systems. Combine mirroring with notebooks and pipelines to build production-grade analytics.

Data Mirroring in Fabric – Complete 2026 Guide

Treat mirroring as a low-latency ingest layer for near-real-time analytics. Apply deterministic transformations with notebooks or pipelines to produce curated, governed outputs. Monitor freshness, validate data quality, and secure access with role-based controls.

Learn more: UltimateInfoGuide.com | Fabric Tutorial Series | Pricing Calc

© 2026 UltimateInfoGuide. Last updated: January 2026. All rights reserved.

Scroll to Top