What is Dataflow Gen2 in Fabric and when should I use it?

Dataflow Gen2 in Fabric is a managed, low-code data transformation surface that enables visual ETL/ELT, schema drift handling, and direct writes to Delta Lake. Use it for repeatable mapping, light enrichment, and operational ingestion tasks where maintainability and visual clarity matter; for heavy custom computation, prefer Spark notebooks.

How do I create and publish a Dataflow Gen2 in Microsoft Fabric?

Create a Lakehouse, then in the Fabric workspace choose New → Dataflow Gen2. Add connectors for sources (ADLS, Blob, SQL, APIs), design transformations on the canvas (select, filter, join, aggregate), validate with sample data, set the sink to a Delta table path, then Save and Publish. Finally, add the published dataflow to a Data Pipeline for scheduling and parameters.

Can Dataflow Gen2 write directly to Delta Lake tables in the Fabric Lakehouse?

Yes. Dataflow Gen2 writes outputs directly to Delta tables stored in the Fabric Lakehouse, enabling ACID semantics and downstream consumption by SQL Warehouse endpoints, notebooks, and dashboards.

What are best practices for making Dataflow Gen2 pipelines idempotent and reliable?

Adopt patterns such as writing incremental data to staging Delta tables, parameterizing dataflows, validating schema and anomalies, and then performing Delta merge (upsert) operations in notebooks or pipeline activities to ensure idempotent, repeatable loads.

How does Dataflow Gen2 handle schema drift and validation?

Dataflow Gen2 can detect schema changes and offers mapping controls; nevertheless, implement validation checks that route unexpected columns or malformed rows to an errors Delta table for inspection and remediation before promoting data to curated layers.

When should I combine Dataflow Gen2 with notebooks in Fabric?

Combine Dataflow Gen2 for mapping and lightweight transforms with notebooks for heavy computation, custom UDFs, and Delta maintenance (OPTIMIZE/Z-ORDER). Use dataflows to produce staging outputs and notebooks to run deterministic merges and large-scale transformations.

How do I scale and control costs for Dataflow Gen2 workloads?

Scale and control costs by pre-filtering at source, using incremental micro-batches, partitioning target Delta tables by high-selectivity columns, compacting small files periodically with notebooks, and scheduling maintenance during off-peak windows to reduce contention and compute spend.

Can Dataflow Gen2 be orchestrated and scheduled with Fabric Data Pipelines?

Yes. Publish Dataflow Gen2 and add it as a Dataflow activity in Fabric Data Pipelines. Then supply runtime parameters, configure retries and alerts, and chain dataflows with notebook activities and downstream jobs for full end-to-end orchestration.

What security and governance practices apply to Dataflow Gen2 outputs?

Apply Microsoft Entra RBAC for workspace and storage access, restrict write permissions to curated tables, use column masking for PII, register produced Delta tables in the governance catalog for lineage, and enable audit logging for dataflow activity and user actions.

Where can I find examples and companion guides for Dataflow Gen2 in Fabric?

Refer to the Fabric Lakehouse tutorial and Data Pipelines guide for architecture and orchestration examples, and review the Transform Data Using Notebooks article for patterns that combine dataflows and notebooks; see internal resources for full step-by-step walkthroughs.

Dataflow Gen2 in Fabric – Microsoft Fabric Tutorial Series 2025

Intro Introduction to Dataflow Gen2 in Fabric

Dataflow Gen2 in Fabric is a modern, managed extract-transform-load (ETL/ELT) surface that runs within Microsoft Fabric. It enables data engineers to build visual transformations, to ingest from diverse sources, and to write clean outputs to the Lakehouse as Delta tables. In addition, Dataflow Gen2 integrates with pipelines, notebooks, and the SQL Warehouse, thereby providing a cohesive data integration strategy.

Quick takeaway: use Dataflow Gen2 when you need visual, repeatable transformations that scale and integrate natively with Fabric storage and orchestration.

Capabilities Capabilities and when to use Dataflow Gen2

Dataflow Gen2 supports low-code transformations, mapping, schema drift handling, and direct writes to Delta tables. Consequently, it is ideal for routine cleanses, joins, and light enrichment tasks. On the other hand, heavy computation or custom Python libraries are better suited to notebooks running on Spark.

Visual transformations

Drag-and-drop operations simplify mapping and business-rule implementation while maintaining a reproducible flow.

Delta integration

Write outputs directly to Lakehouse Delta tables, enabling ACID semantics and downstream consumption by SQL Warehouse and notebooks.

Therefore, choose Dataflow Gen2 for repeatable ELT where maintainability and visual clarity are priorities.

Start Quick setup: create and connect a Dataflow Gen2

To begin, create a Lakehouse and then add a Dataflow Gen2 from the Fabric workspace. Next, define sources (e.g., blob, ADLS, SQL, APIs), build transformations using the canvas, and finally target a Delta table in your curated layer.

# Typical sequence (high level)
1. Create Lakehouse (if not present)
2. In Fabric workspace: New → Dataflow Gen2
3. Add data source connectors and schema mappings
4. Design transform steps (select, filter, join, aggregate)
5. Set sink: Delta table path in Lakehouse
6. Validate data, then Save and Publish

Moreover, test the flow with small sample inputs before promoting to production to avoid unexpected errors at scale.

Patterns Design patterns for reliable Dataflow Gen2 pipelines

Adopt patterns that make dataflows resilient and reproducible. For example, separate exploration from production flows, parameterize connections, and create idempotent outputs.

Idempotent writes and CDC handling

When incremental loads are required, configure dataflows to write staging tables and then perform Delta merge operations from notebooks or pipeline activities to ensure idempotency and to avoid duplication.

Schema drift and validation

Dataflow Gen2 can handle schema drift, but therefore implement validation checks and write anomalies to an errors table for later remediation.

Parameterization and environment separation

Parameterize sources, date windows, and destination paths so the same dataflow can run across dev, stage, and prod without changes to the canvas logic.

Pro tip: combine Dataflow Gen2 for mapping and notebooks for heavy logic. See how notebooks transform data in our companion guide: Transform Data Using Notebooks.

Scale Performance, scaling, and cost control

Dataflow Gen2 is optimized for operational transformations, yet it still depends on data volume and complexity. Consequently, monitor file sizes, partitioning, and shuffle operations to control cost and runtime.

Pre-filter at source — reduce data early to lower I/O.
Write partitioning — target Delta tables partitioned by date or business keys.
Use incremental loads — process micro-batches to avoid full re-writes.
Monitor runs — track runtime, throughput, and failures from pipeline logs.

-- Example maintenance: compact Delta files periodically via notebooks
OPTIMIZE curated.table_name ZORDER BY (customer_id)

Finally, schedule heavy maintenance tasks during off-peak windows to avoid contention with regular dataflow runs.

Ops Orchestration and production operations

Integrate Dataflow Gen2 with Fabric Data Pipelines for scheduling, retries, and alerts. Therefore, use pipeline variables to pass parameters and to chain dataflows with notebooks or other activities.

Publish dataflow and add it to a pipeline as a Dataflow activity.
Supply runtime parameters (date, environment) from pipeline variables.
Configure retry policies and failure notifications.
Monitor runs and surface metrics to dashboards for SLA tracking.

For full pipeline patterns and orchestration, see our Data Pipelines guide: Data Pipelines in Fabric.

Trust Security, governance, and observability for dataflows

Secure dataflows by applying Microsoft Entra RBAC to workspace and storage access. Additionally, ensure that outputs to Lakehouse use appropriate table permissions and that sensitive columns are masked or excluded from published datasets.

Least-privilege access for connectors and sinks
Audit logs for run history and user actions
Anomalies and schema drift logged to error tables for review

Moreover, document data lineage by registering produced Delta tables in your governance catalog so consumers can trace source systems and transformations.

Recipes Practical recipes and common use cases

Below are concise, production-minded recipes to use Dataflow Gen2 effectively.

Recipe 1 — Ingest API data and deliver to curated Delta

1. Create a Dataflow Gen2 connector to the API (JSON source)
2. Map fields, convert types, and apply dedupe logic
3. Write to staging Delta table in Lakehouse
4. Merge staging to curated table via a scheduled notebook (Delta merge)

Recipe 2 — Daily CSV ingestion with validation

1. Dataflow ingests daily CSV files from storage
2. Validate schema; route bad rows to errors Delta table
3. Normalize and write partitioned Delta files to curated path
4. Trigger downstream pipeline activities for reporting

Use these recipes as templates, and parameterize to run across environments and date ranges.

FAQ Frequently asked questions about Dataflow Gen2

When should I use Dataflow Gen2 versus notebooks?

Use Dataflow Gen2 for visual, repeatable mapping and light enrichment; however, for heavy custom logic or specialized libraries, use Spark notebooks. In practice, combine both: dataflows for mapping and notebooks for compute-intensive steps.

Can Dataflow Gen2 write directly to Delta Lake?

Yes. Dataflow Gen2 targets Delta tables in the Lakehouse to produce ACID-compliant outputs that downstream systems can consume.

Is Dataflow Gen2 suitable for CDC scenarios?

For Change Data Capture patterns, use Dataflow Gen2 to land incremental data into staging tables, and then apply deterministic merges in notebooks or pipeline activities to implement idempotent upserts.

Where can I learn more and see examples?

Explore related tutorials: Fabric Lakehouse setup (Lakehouse), orchestration with Data Pipelines (Data Pipelines), and advanced transforms in notebooks (Transform Notebooks).

Links References and further reading

Note: adapt example paths and compute settings to your environment. Last updated: October 2025

Intro Introduction to Dataflow Gen2 in Fabric

Capabilities Capabilities and when to use Dataflow Gen2

Visual transformations

Delta integration

Start Quick setup: create and connect a Dataflow Gen2

Patterns Design patterns for reliable Dataflow Gen2 pipelines

Idempotent writes and CDC handling

Schema drift and validation

Parameterization and environment separation

Scale Performance, scaling, and cost control

Ops Orchestration and production operations

Trust Security, governance, and observability for dataflows

Recipes Practical recipes and common use cases

Recipe 1 — Ingest API data and deliver to curated Delta

Recipe 2 — Daily CSV ingestion with validation

FAQ Frequently asked questions about Dataflow Gen2

When should I use Dataflow Gen2 versus notebooks?

Can Dataflow Gen2 write directly to Delta Lake?

Is Dataflow Gen2 suitable for CDC scenarios?

Where can I learn more and see examples?

Links References and further reading

Related Posts