Data Engineering · Updated June 2026

dbt Best Practices for SQL Transformation — Complete 2026 Guide

Q: What are dbt best practices for project structure?

Use a three-layer architecture: staging (stg_*) for source cleanup, intermediate (int_*) for business logic assembly, and marts (dim_*, fct_*) for governed serving models. Numbered folders (01_staging, 02_intermediate, 03_marts) keep execution order clear. This separation means changes in one layer do not cascade unpredictably to others.

Q: Should I always use ref() in dbt?

Yes. Always use {{ ref('model_name') }} instead of hardcoded schema.table references. dbt uses ref() calls to build the dependency DAG, validate upstream models exist, and swap schemas between dev and prod targets automatically. Hardcoded references break environment switching and prevent dbt from tracking lineage.

Q: What is the best dbt materialization strategy?

Views for staging and intermediate models (zero storage, always fresh). Tables or incremental for marts that feed dashboards or heavy ad-hoc queries. Incremental models with merge strategy for fact tables above 100 million rows where full recompute time exceeds 30 minutes. Ephemeral for lightweight CTE helpers that do not need to persist.

Q: What dbt tests should I have at minimum?

Every primary key column needs unique and not_null tests. Every foreign key needs a relationships test. Important categorical columns need accepted_values. For business rules — like revenue cannot be negative — use expression_is_true or custom singular tests. Focus coverage on models that power core metrics and dashboards.

Q: When should I use dbt snapshots?

Use dbt snapshots for Slowly Changing Dimensions (SCD Type 2) — tracking how entity attributes change over time. Common candidates: customer status, subscription plan, pricing tier, account owner. Use strategy='check' with check_cols listing the columns to monitor. dbt handles the effective dates, valid_from/valid_to columns, and history table management automatically.

Q: What is slim CI in dbt?

Slim CI is a CI pattern that runs dbt commands only on models modified in a pull request, rather than the full project. It uses dbt state:modified with a deferred state from the previous production run. This makes CI jobs significantly faster on large projects — instead of running thousands of models, only changed models and their downstream dependents are tested.

Twelve practical rules for building dbt projects that stay maintainable as teams and data volumes grow — covering project structure, ref(), staging, testing, materializations, snapshots, macros, CI/CD, performance, packages, and the dbt Core v2.0 changes that affect every project in 2026. All patterns verified against official dbt Labs documentation and the dbt Developer Hub.

Quick Answer

The most important dbt best practices in 2026: three-layer project structure (staging → intermediate → marts), always use ref() and source(), test every primary and foreign key, match materialization to query pattern (views for staging, tables/incremental for marts), slim CI for fast pull request feedback, and dbt lint (now in beta, built into the Fusion engine and SQLFluff-compatible) for automated style enforcement. If you are on dbt Core v1.12, test v2.0 compatibility now with dbt parse --use-v2-parser before the alpha becomes the default.

📅 Updated: June 2026 ⏱ ~16 min read ✍️ A.J., Data Engineering Researcher 🔗 Source: dbt Developer Hub

Section 00

dbt Core v2.0 — What Every dbt Project Needs to Know in 2026

dbt Core v2.0 entered alpha at Snowflake Summit 2026 on June 1, 2026. It remains fully open source under the Apache 2.0 license. The headline change is that dbt Core now shares the same Rust-based runtime as the dbt Fusion engine — ending the two-engine era where Cloud and Core behaved differently. Every other 2026 best practice update flows from this foundation.

What Changed	Detail	Status (June 2026)
Rust-based parser	Significantly faster DAG compilation on large projects. Shared between Core and Fusion.	Alpha
dbt lint (beta)	Built-in linter, SQLFluff-compatible, replaces external sqlfluff step for most teams. Run with `dbt lint`.	Beta
dbt Docs v2	Rebuilt docs site with faster search, better DAG visualization, and model-level lineage views.	Alpha
Stricter project spec	Some v1.x permissive behaviours become errors in v2.0. Unused configs and ambiguous refs raise warnings.	Alpha
dbt State	State-based CI patterns (slim CI) are now a first-class primitive — documented and officially supported.	Stable

🔶

Test v2.0 Compatibility Now

Run dbt parse --use-v2-parser on your existing project against the v2.0 alpha to surface any breaking changes before the new parser becomes the default. Most valid v1.x projects pass cleanly — but unused config warnings and a small number of ambiguous ref() patterns may surface that are worth addressing now.

Note: dbt Labs and Fivetran announced a merger in 2026. The dbt product roadmap, licensing (Apache 2.0), and community packages remain unchanged. The merger affects go-to-market — not dbt Core behavior.

Rule 01

dbt Project Structure — Three Layers, One Responsibility Each

The most foundational dbt best practice is layer separation. A three-layer structure — staging, intermediate, and marts — keeps transformations readable and debuggable as projects grow. Each layer has a single job: staging cleans sources, intermediate assembles business logic, marts serve governed data to consumers.

This structure means a change in a source table affects only one staging file. Business logic changes are isolated to intermediate models. Mart restructuring does not touch the source cleanup layer. The DAG becomes predictable.

Recommended Layout

models/
├── 01_staging/
│   ├── _sources.yml        # source() definitions
│   ├── stg_customers.sql
│   └── stg_orders.sql
├── 02_intermediate/
│   ├── int_order_items.sql
│   └── int_customer_ltv.sql
└── 03_marts/
    ├── dim_customers.sql
    └── fct_orders.sql

Numbered folders keep execution order visible in file explorers and logs. Prefixes (stg_, int_, dim_, fct_) communicate a model’s role at a glance in compiled SQL, data catalogs, and error logs where folder context is absent.

Field note — A.J., UIG Data Lab

The most common project structure mistake is putting business logic into staging models. Once you add a join or an aggregation to a staging model, you break the contract that staging = one-to-one with the source. The next engineer assumes stg_orders contains raw order data — and gets confused when it already excludes cancelled orders. Keep staging boring. Put all the interesting logic in intermediate.

Rule 02

Always Use ref() and source() — Never Hardcode Table Names

Hardcoding schema and table names is the fastest way to break a dbt project when environments change. ref() tells dbt about inter-model dependencies, builds the DAG, validates that upstream models exist at compile time, and swaps schemas between dev and prod targets automatically. source() does the same for raw ingestion tables, adding freshness monitoring capabilities.

-- Never do this
SELECT * FROM analytics_staging.stg_orders;

-- Always do this
SELECT * FROM {{ ref('stg_orders') }};

-- For raw source tables, use source()
SELECT * FROM {{ source('raw', 'orders') }};

With ref(), renaming schemas, cloning environments, or reorganizing models becomes a configuration change, not a search-and-replace across dozens of SQL files. In dbt Core v2.0, ambiguous or unused refs now surface as warnings — making it easier to catch stale references before they cause runtime failures.

source() Freshness Testing

Declaring sources in _sources.yml unlocks dbt source freshness — a command that checks how recently each source was updated and warns when data goes stale. This is the earliest possible point to detect upstream pipeline failures before they propagate into downstream models and dashboards.

# _sources.yml
sources:
  - name: raw
    tables:
      - name: orders
        loaded_at_field: _loaded_at
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

Rule 03

Staging Layer Best Practices — One Source, One File, No Logic

Staging models read directly from source(), standardize column names and data types, apply very light cleaning (trim whitespace, convert units), and nothing else. No joins. No aggregations. No business rules. One staging file per source table.

This simplicity creates a stable contract. When a source changes a column name or data type, only one staging file updates. Every downstream model that uses ref('stg_orders') continues to work because the staging model’s output contract is unchanged.

{{ config(materialized='view') }}

SELECT
  id::int                AS customer_id,
  LOWER(TRIM(email))     AS customer_email,
  created_at::timestamp  AS created_at_utc,
  status::varchar        AS customer_status,
  -- amount_cents → amount in dollars at staging
  amount_cents / 100.0   AS amount_usd
FROM {{ source('raw', 'customers') }}

📌

Staging Models Are Always Views

Staging models should be materialized as views by default — set this in dbt_project.yml at the folder level so you do not need to add config to every file. Views have zero storage cost and always return the latest data. The one exception: if a staging model is called by dozens of downstream models and query time matters, materialize it as a table.

Rule 04

dbt Testing Best Practices — Test What Matters, Not Everything

dbt has four built-in generic tests: unique, not_null, accepted_values, and relationships. These cover the most common data quality assumptions. Beyond these, dbt_expectations (from the package) adds 30+ additional test types, and custom singular tests handle business-rule validation.

Minimum Test Coverage for Every Model

version: 2
models:
  - name: fct_orders
    description: "Order-level fact table for revenue and conversion reporting."
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: status
        tests:
          - accepted_values:
              values: ['placed','shipped','delivered','cancelled']
      - name: amount_usd
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"

Unit Tests for Complex Logic (dbt v1.8+)

Unit tests, available from dbt v1.8, let you define small input tables and expected outputs to validate transformation logic in isolation. Use them for models with complex business rules that cannot be expressed as simple constraints — for example, loyalty tier calculation or revenue attribution logic.

# unit_tests/test_loyalty_tier.yml
unit_tests:
  - name: test_loyalty_tier_gold
    model: int_customer_tiers
    given:
      - input: ref('stg_customers')
        rows:
          - {customer_id: 1, lifetime_spend_usd: 1500}
    expect:
      rows:
        - {customer_id: 1, loyalty_tier: 'gold'}

⚠️

Tag Tests for Selective CI Runs

Tag your most critical tests with tags: ['critical'] so CI pipelines can run only essential checks on every pull request with dbt test --select tag:critical. Reserve the full test suite for nightly or scheduled runs. This balance reduces CI feedback time without sacrificing data quality guarantees on the models that matter most.

Rule 05

dbt Materializations Best Practices — Match Strategy to Query Pattern

Choosing the wrong materialization wastes warehouse compute or returns stale data. The rule is simple: the closer a model is to a data consumer, the more likely it benefits from physical materialization.

Materialization	When to Use	Storage Cost	Data Freshness
view	Staging and intermediate models. Low-traffic models. Anything that does not justify storage cost.	None	Always fresh — resolved at query time
table	Marts with complex joins/aggregations queried by many users. When view query time is unacceptable.	Full scan each run	As fresh as last run
incremental	Large fact tables (100M+ rows) where full recompute takes more than 30 minutes.	Append or merge only new rows	As fresh as last run + cadence
ephemeral	Lightweight CTE helpers referenced once or twice. Should not persist.	None — inlined into calling model	N/A

Set Layer Defaults in dbt_project.yml

# dbt_project.yml
models:
  your_project:
    01_staging:
      +materialized: view
    02_intermediate:
      +materialized: view
    03_marts:
      +materialized: table

Setting defaults at the folder level means individual model files only need to declare materialization when they deviate from the layer default — keeping SQL files clean and reducing configuration drift.

Rule 06

dbt Snapshots Best Practices — Slowly Changing Dimensions Without MERGE Complexity

Snapshots capture how rows change over time — implementing SCD Type 2 without hand-written MERGE statements. dbt handles the effective date columns (dbt_valid_from, dbt_valid_to), the hash comparison, and the history table, consistently across all environments.

{% snapshot snap_customer_status %}

{{ config(
  target_schema = 'snapshots',
  unique_key    = 'customer_id',
  strategy      = 'check',
  check_cols    = ['status', 'subscription_tier', 'account_owner']
) }}

SELECT
  customer_id,
  status,
  subscription_tier,
  account_owner,
  updated_at
FROM {{ ref('stg_customers') }}

{% endsnapshot %}

✅

Good Snapshot Candidates

Customer status and subscription tier
Account ownership and territory assignments
Pricing tier and discount category
Employee department and role
Product lifecycle stage

❌

Poor Snapshot Candidates

High-frequency event tables (use event logs instead)
Tables without a reliable updated_at column
Tables with hundreds of columns (snapshot only changed columns)
Tables where full history exists in source (no need for CDC)

Rule 07

dbt Naming Conventions — Self-Documenting Names That Work in Any Context

dbt model names appear in compiled SQL, data catalogs, logs, error messages, BI tool table pickers, and Data Agent queries. A well-chosen name communicates both the layer and the business concept without needing surrounding context.

Prefix	Layer	Examples
`stg_`	Staging	stg_customers, stg_orders, stg_events
`int_`	Intermediate	int_order_items, int_customer_ltv
`dim_`	Dimension (mart)	dim_customers, dim_products, dim_dates
`fct_`	Fact (mart)	fct_orders, fct_sessions, fct_revenue
`snap_`	Snapshot	snap_customer_status, snap_account_tier
`mrt_`	Wide mart / OBT	mrt_customer_360, mrt_product_summary

Use business vocabulary, not source system vocabulary. dim_accounts is better than stg_sf_account because it describes what the model represents to a business user, not which system it came from. Column names should follow the same principle — customer_id, not cust_pk or c_id.

Rule 08

dbt Documentation Best Practices — Code and Docs in the Same File

Documentation that lives in a separate system drifts from reality within weeks. dbt documentation lives inside schema.yml files alongside test definitions — the same place engineers look when maintaining a model. Run dbt docs generate && dbt docs serve to produce a searchable, navigable data catalog from what is already in the repo.

models:
  - name: fct_orders
    description: >
      Order-level fact table. Source of truth for revenue and conversion
      reporting. Excludes test orders (is_test = true) and internal
      employee orders (customer_type = 'employee').
    columns:
      - name: order_id
        description: "Surrogate primary key for the order."
      - name: customer_id
        description: "FK to dim_customers.customer_id."
      - name: order_placed_at
        description: "UTC timestamp when the order was submitted."
      - name: gross_revenue_usd
        description: "Total order value before returns, refunds, or discounts."
      - name: net_revenue_usd
        description: "Gross revenue minus approved refunds. Use this for P&L reporting."

The description on gross_revenue_usd vs net_revenue_usd is exactly the kind of business context that prevents analysts from picking the wrong column. Column-level descriptions like this are also read by dbt Docs v2 and surface in Data Agent semantic context when agents query the warehouse.

💡

dbt Docs v2 (Alpha — dbt Core v2.0)

dbt Docs v2, part of the dbt Core v2.0 alpha, includes faster search, richer DAG visualization, and model-level lineage views that show which dashboards and downstream models depend on each node. For large projects with hundreds of models, the improved navigation alone is worth testing the alpha on a non-production branch.

Rule 09

dbt Macros Best Practices — DRY SQL at Scale

When the same SQL pattern appears in three or more models, it is a macro candidate. Macros centralize complex expressions, ensure consistent logic across models, and make changes in one place rather than many. Common macro use cases: surrogate key generation, date spine creation, conditional filter logic, data masking, and recurring CTE patterns.

-- macros/generate_surrogate_key.sql
{% macro generate_surrogate_key(column_list) %}
  {{ dbt_utils.generate_surrogate_key(column_list) }}
{% endmacro %}

-- Usage in any model
SELECT
  {{ generate_surrogate_key(['order_id', 'product_id']) }} AS order_line_id,
  order_id,
  product_id,
  quantity
FROM {{ ref('stg_order_lines') }}

When Not to Write a Macro

Macros add indirection that can confuse engineers unfamiliar with Jinja. If SQL logic only appears once or twice, a comment explaining the pattern is simpler than a macro. Reserve macros for genuinely repeated patterns — the three-occurrence rule is a reliable threshold.

Rule 10

dbt Performance Best Practices — Incremental Models and Physical Design

dbt performance is largely determined by two things: how much data moves through the DAG on each run, and how well the warehouse can scan and join that data. Both are under your control through model design and materialization choices.

Incremental Model Pattern

{{ config(
  materialized       = 'incremental',
  unique_key         = 'order_id',
  incremental_strategy = 'merge'
) }}

SELECT
  order_id,
  customer_id,
  order_placed_at,
  amount_usd
FROM {{ ref('stg_orders') }}

{% if is_incremental() %}
  WHERE order_placed_at >= (
    SELECT MAX(order_placed_at) FROM {{ this }}
  )
{% endif %}

Performance Practice	Impact	When to Apply
Incremental models	Eliminates full table recompute on large facts	Facts above 100M rows or 30+ min full run
Filter early in SQL	Reduces data scanned by downstream steps	Any model with a known partition or date filter
Avoid SELECT *	Prevents accidental column explosion in output	All models — be explicit about columns
Ephemeral for helpers	Avoids unnecessary table writes	CTEs used in one or two places
cluster_by / partition on warehouses	Enables partition pruning for common filters	Large tables filtered frequently by date or key

On Microsoft Fabric, Databricks, and Snowflake, aligning cluster or partition columns with common WHERE clause patterns can reduce query cost dramatically — the warehouse skips entire file partitions rather than scanning all rows. For Fabric specifically, OPTIMIZE and ZORDER on the Delta table that underlies the model are applied separately after the dbt run. See Fabric Data Warehouse Optimization for those patterns.

Rule 11

dbt Packages Best Practices — Community Tools That Save Hundreds of Lines

dbt packages provide pre-built macros, tests, and models that the community maintains. Adding them takes two lines in packages.yml and one dbt deps command. The three packages every production dbt project should evaluate:

# packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: 1.3.0
  - package: calogica/dbt_expectations
    version: 0.10.4
  - package: dbt-labs/codegen
    version: 0.12.1

🛠️

dbt_utils

dbt_utils.date_spine() — generate complete date tables
dbt_utils.generate_surrogate_key() — consistent hashed keys
dbt_utils.union_relations() — union models dynamically
expression_is_true — custom boolean test

🧪

dbt_expectations

expect_column_values_to_be_of_type
expect_column_values_to_be_between
expect_table_row_count_to_be_between
30+ Great Expectations-style tests directly in YAML

⚡

dbt-codegen

Auto-generate schema.yml stubs from source tables
Auto-generate staging model SQL from source definitions
Saves hours on project initialization

📊

dbt_project_evaluator

Enforces dbt best practices as automated tests
Flags models missing tests, descriptions, or correct prefixes
Generates a compliance report for every run
Turns your conventions into CI gates

Rule 12

dbt CI/CD Best Practices — Slim CI, Linting, and Production Safeguards

Manual checks do not scale as teams and projects grow. CI/CD integration turns dbt best practices into automated gates that run on every pull request. The goal is fast feedback on what changed, not running thousands of models on every commit.

Slim CI — Only Run What Changed

Slim CI uses dbt state:modified with a deferred state artifact from the previous production run. Only modified models and their downstream dependents are built and tested — dramatically reducing CI run time on large projects.

# GitHub Actions example — slim CI
- name: dbt slim CI
  run: |
    # Download the production manifest from your artifact store
    dbt deps
    dbt lint                          # dbt Core v2.0 beta linter
    dbt build       --select state:modified+              --defer                               --state ./prod-manifest               --target ci

dbt lint (Beta — dbt Core v2.0)

dbt lint is built into the dbt Core v2.0 alpha and is SQLFluff-compatible. For projects already using SQLFluff as an external step, dbt lint replaces it with a faster, dbt-aware implementation that understands Jinja templating natively. Test it with dbt lint --select path:models/.

Full CI/CD Pipeline Structure

# Pull request pipeline
dbt deps                              # Install packages
dbt lint                              # Style and convention checks (v2.0 beta)
dbt parse --use-v2-parser             # Validate v2.0 compatibility
dbt build --select state:modified+    # Build and test modified models only

# Nightly production run
dbt source freshness                  # Check upstream data freshness
dbt build --full-refresh --select tag:daily_full  # Full refresh for daily models
dbt test --select tag:critical        # Run all critical tests

✅

dbt_project_evaluator as a CI Gate

Add dbt_project_evaluator to your CI pipeline. It runs as a dbt model itself — generating rows for every best practice violation it detects: untested models, missing descriptions, incorrect prefixes, direct source references in marts, etc. Treating these as CI failures enforces your conventions automatically rather than relying on code review to catch them.

FAQ

dbt Best Practices — Frequently Asked Questions

What are dbt best practices for project structure?

Use a three-layer architecture: staging (stg_*) for source cleanup, intermediate (int_*) for business logic assembly, and marts (dim_*, fct_*) for governed serving models. Use numbered folders (01_staging, 02_intermediate, 03_marts) to keep execution order visible. Set materialization defaults per folder in dbt_project.yml. This separation means changes in one layer do not cascade unpredictably to others.

Should I always use ref() in dbt?

Yes — always. Use {{ ref('model_name') }} for inter-model references and {{ source('schema', 'table') }} for raw source tables. dbt uses ref() calls to build the dependency DAG, validate upstream models at compile time, and swap schemas between dev and prod targets automatically. Hardcoded schema.table references break environment switching and prevent dbt from tracking lineage correctly.

What is the best dbt materialization strategy?

Views for staging and intermediate models — zero storage, always fresh. Tables for marts that feed dashboards or heavy ad-hoc queries. Incremental models with merge strategy for fact tables above 100 million rows where full recompute exceeds 30 minutes. Ephemeral for lightweight CTE helpers referenced once or twice. Set defaults per folder in dbt_project.yml so individual files only declare exceptions.

What dbt tests should I have at minimum?

Every primary key needs unique and not_null. Every foreign key needs relationships. Important categorical columns need accepted_values. Numeric business metrics need expression_is_true (e.g. amount_usd >= 0). For complex business rules, use unit tests (dbt v1.8+) or custom singular tests. Tag critical tests so CI can run only them on PRs with dbt test --select tag:critical.

What is dbt Core v2.0 and what changes for existing projects?

dbt Core v2.0 entered alpha at Snowflake Summit 2026 in June 2026. It is open source (Apache 2.0) and shares the same Rust-based runtime as the Fusion engine. Key additions: faster Rust parser, dbt lint (beta, SQLFluff-compatible), dbt Docs v2, and stricter project spec enforcement. Most valid v1.x projects pass cleanly — test yours with dbt parse --use-v2-parser before v2.0 becomes the default.

What is slim CI in dbt?

Slim CI runs dbt commands only on models modified in a pull request using --select state:modified+ with --defer and a production state artifact. Instead of rebuilding all models on every PR, only changed models and their downstream dependents are built and tested. On large projects this reduces CI time from hours to minutes. dbt State is now a first-class officially documented primitive in dbt Core.

When should I use dbt snapshots?

Use snapshots for SCD Type 2 — tracking how entity attributes change over time. Good candidates: customer status, subscription tier, account owner, product lifecycle stage. Use strategy='check' with check_cols listing the columns to monitor. dbt manages dbt_valid_from, dbt_valid_to, and the history table automatically across environments.

⚠ Accuracy Disclaimer

dbt practices are verified against the official dbt Developer Hub and dbt Labs release notes through June 2026. dbt Core v2.0, dbt lint, and dbt Docs v2 are in alpha/beta — verify current status at docs.getdbt.com before adopting in production. UIG Data Lab is an independent publication, not affiliated with dbt Labs.

A.J. Data Engineering Researcher & Technical Writer · UIG Data LabAll articles →

A.J. researches and writes about data engineering, analytics architecture, Microsoft Fabric, and modern cloud data platforms. Coverage spans Microsoft Fabric, Power BI, Azure Data Engineering, Databricks, Snowflake, Apache Spark, dbt, Apache Airflow, and modern cloud data infrastructure. The focus is practitioner-level content that helps data professionals understand platform capabilities, evaluate technology decisions, optimize costs, and implement practical solutions using official documentation, product updates, community insights, and industry best practices.

dbtAnalytics EngineeringSQL Transformationdbt Core v2Data TestingCI/CDSnowflakeMicrosoft FabricDatabricks