Welcome to our Microsoft Fabric Tutorial Series! In this detailed guide, we’ll explore Dataflow Gen2 — one of the most powerful data ingestion and transformation tools in Microsoft Fabric. Whether you’re a data engineer, analyst, or BI developer, understanding how Dataflow Gen2 works is crucial to building scalable and automated pipelines.

🔍 What is Dataflow Gen2 in Microsoft Fabric?
Dataflow Gen2 is a low-code data transformation solution built on top of Power Query in Microsoft Fabric. It allows you to ingest, clean, shape, and load data into your Lakehouse, Warehouse, or other storage layers in a scalable, reusable, and automated fashion — without writing complex code.

💡 Key Features of Dataflow Gen2
- Low-code/no-code: Build transformations with drag-and-drop Power Query interface
- Auto-load to OneLake: Automatically saves transformed data to Lakehouse or Warehouse
- Reusable logic: You can reuse queries across different dataflows
- Scheduled refresh: Easily schedule data refresh for automation
- Support for various sources: SQL Server, REST API, CSV, SharePoint, Dataverse, and more
🏗️ Components of a Dataflow Gen2
A Dataflow Gen2 runs inside a Fabric workspace and consists of three major parts:
- Source connectors: Ingest data from CSV, SQL Server, REST, SharePoint, Blob, Excel, etc.
- Power Query transformations: Apply filters, joins, merges, calculated columns, and reshaping
- Output destinations: Store the transformed data into Lakehouse, Warehouse, or other storage targets

🛠️ Step-by-Step: Create Your First Dataflow Gen2
Let’s walk through creating a Dataflow Gen2 using the Fabric UI:
- Step 1: Go to your Fabric Workspace > Dataflows Gen2 > New dataflow
- Step 2: Select data source (e.g., SQL Server, CSV)
- Step 3: Use Power Query Editor to transform data (e.g., rename columns, filter rows)
- Step 4: Choose a destination – Lakehouse or Warehouse
- Step 5: Save and publish the dataflow
- Step 6: Optionally schedule the refresh or monitor history
📦 Output Options: Where Can You Store the Data?
Dataflow Gen2 supports two major output destinations:
- Lakehouse: For analytical workloads using notebooks, Spark, Power BI, and machine learning
- Warehouse: For structured reporting, T-SQL access, and Power BI semantic models
📈 Pricing and Capacity Model

Dataflow Gen2 pricing is based on the capacity of the Fabric workspace you’re using. It consumes Capacity Units (CUs) during refresh operations. Each workspace has a defined CU limit based on your Fabric license (e.g., F2, F4, etc.).
💡 Best Practices for Dataflow Gen2
- Keep transformations lightweight and optimized
- Use filters early to reduce data volume
- Enable load only on required queries
- Use “reference” instead of “duplicate” to reuse queries efficiently
- Monitor refresh time and failure logs
Stay tuned for the next post where we’ll dive into Data Mirroring in Microsoft Fabric.