Dataflow Gen2 in Fabric – Microsoft Fabric Tutorial Series 2025

Welcome to our Microsoft Fabric Tutorial Series! In this detailed guide, we’ll explore Dataflow Gen2 — one of the most powerful data ingestion and transformation tools in Microsoft Fabric. Whether you’re a data engineer, analyst, or BI developer, understanding how Dataflow Gen2 works is crucial to building scalable and automated pipelines.

Dataflow Gen2 Architecture Diagram
Dataflow Gen2 Architecture in Microsoft Fabric

🔍 What is Dataflow Gen2 in Microsoft Fabric?

Dataflow Gen2 is a low-code data transformation solution built on top of Power Query in Microsoft Fabric. It allows you to ingest, clean, shape, and load data into your Lakehouse, Warehouse, or other storage layers in a scalable, reusable, and automated fashion — without writing complex code.

Power Query Experience in Dataflow Gen2
Power Query Transformation Experience

💡 Key Features of Dataflow Gen2

  • Low-code/no-code: Build transformations with drag-and-drop Power Query interface
  • Auto-load to OneLake: Automatically saves transformed data to Lakehouse or Warehouse
  • Reusable logic: You can reuse queries across different dataflows
  • Scheduled refresh: Easily schedule data refresh for automation
  • Support for various sources: SQL Server, REST API, CSV, SharePoint, Dataverse, and more

🏗️ Components of a Dataflow Gen2

A Dataflow Gen2 runs inside a Fabric workspace and consists of three major parts:

  • Source connectors: Ingest data from CSV, SQL Server, REST, SharePoint, Blob, Excel, etc.
  • Power Query transformations: Apply filters, joins, merges, calculated columns, and reshaping
  • Output destinations: Store the transformed data into Lakehouse, Warehouse, or other storage targets
Dataflow Connectors and Outputs
Dataflow Gen2: Source Connectors and Destination Options

🛠️ Step-by-Step: Create Your First Dataflow Gen2

Let’s walk through creating a Dataflow Gen2 using the Fabric UI:

  • Step 1: Go to your Fabric Workspace > Dataflows Gen2 > New dataflow
  • Step 2: Select data source (e.g., SQL Server, CSV)
  • Step 3: Use Power Query Editor to transform data (e.g., rename columns, filter rows)
  • Step 4: Choose a destination – Lakehouse or Warehouse
  • Step 5: Save and publish the dataflow
  • Step 6: Optionally schedule the refresh or monitor history

📦 Output Options: Where Can You Store the Data?

Dataflow Gen2 supports two major output destinations:

  • Lakehouse: For analytical workloads using notebooks, Spark, Power BI, and machine learning
  • Warehouse: For structured reporting, T-SQL access, and Power BI semantic models

📈 Pricing and Capacity Model

Dataflow Gen2 Pricing Structure
Dataflow Gen2 Pricing Model Overview

Dataflow Gen2 pricing is based on the capacity of the Fabric workspace you’re using. It consumes Capacity Units (CUs) during refresh operations. Each workspace has a defined CU limit based on your Fabric license (e.g., F2, F4, etc.).

💡 Best Practices for Dataflow Gen2

  • Keep transformations lightweight and optimized
  • Use filters early to reduce data volume
  • Enable load only on required queries
  • Use “reference” instead of “duplicate” to reuse queries efficiently
  • Monitor refresh time and failure logs

Stay tuned for the next post where we’ll dive into Data Mirroring in Microsoft Fabric.

Microsoft Fabric Power Query, create dataflow in Fabric, Fabric Data Pipeline, Microsoft Fabric ETL, data transformation Fabric, data ingestion Fabric, Power BI Fabric Dataflow, mapping dataflows Fabric

Scroll to Top