dbt Best Practices – 10 Rules for Scalable, Modular SQL Transformation

dbt Best Practices: 10 Rules for Scalable, Modular SQL Transformation

dbt Best Practices – dbt (data build tool) brings software engineering discipline to analytics engineering. It lets data teams transform raw warehouse tables into clean, well‑tested models using SQL plus version control, automated tests, documentation, and CI/CD. As a result, dbt has become the default standard for modern data teams working on Snowflake, BigQuery, Redshift, Databricks, Fabric, and Postgres.

However, tools alone do not guarantee reliability. Without clear dbt best practices, projects can still turn into fragile, untested SQL jungles. This guide walks through 10 practical rules used by experienced teams to structure dbt projects, keep SQL modular, enforce data quality, and ship fast without breaking dashboards.

dbt Project Structure Best Practices

The most important dbt best practice is to separate models into clear layers. A three‑layer structure—staging, intermediate, and marts—keeps transformations understandable and debuggable as projects grow. Each layer has a single responsibility so changes in one place do not cascade unpredictably through the rest of the DAG.

Use folders and naming conventions together so anyone opening the repo can instantly see where raw source cleanup happens, where business logic lives, and which models are safe to connect BI tools to.

Recommended 3‑Layer Layout

models/
  01_staging/
    stg_customers.sql
    stg_orders.sql
  02_intermediate/
    int_order_items.sql
    int_customer_ltv.sql
  03_marts/
    dim_customers.sql
    fct_orders.sql
      

Numbered folders keep execution order obvious, while prefixes like stg_, int_, dim_, and fct_ make the model’s role clear even when viewed out of context, such as in compiled SQL or logs.

dbt ref() Best Practices

Hard‑coding schema and table names is the fastest way to break a dbt DAG. Instead, always reference other models with ref(). dbt uses these calls to build the dependency graph, validate that upstream models exist, and swap schemas between development and production targets without changes to SQL.

Always Prefer ref() Over Raw Table Names

-- ❌ Anti‑pattern
SELECT *
FROM analytics_staging.stg_orders;

-- ✅ Best practice
SELECT *
FROM {{ ref('stg_orders') }};
      

With this approach, renaming schemas, cloning environments, or reorganizing models becomes mostly configuration work instead of a dangerous search‑and‑replace across dozens of SQL files.

dbt Staging Layer Best Practices

Staging models should be boring on purpose. They read directly from sources, standardize column names, fix data types, and apply very light cleaning such as trimming whitespace or converting cents to dollars. They should not contain joins, aggregations, or complicated business rules.

By keeping staging models simple and one‑to‑one with upstream tables, you create a stable contract between raw ingestion and the rest of your dbt project. When a source adds a column or changes a data type, only a single staging file needs to be updated.

Example Simple Staging Model

{{ config(materialized='view') }}

SELECT
  id::int           AS customer_id,
  email::varchar    AS customer_email,
  created_at::timestamp AS created_at_utc,
  status::varchar   AS customer_status
FROM {{ source('raw', 'customers') }};
      

dbt Testing Best Practices

Testing is one of the biggest reasons to adopt dbt. At minimum, every important model should enforce uniqueness and non‑null constraints on primary keys, plus referential integrity on foreign keys. Beyond that, targeted business‑rule tests catch issues that simple schema checks cannot see.

Instead of trying to test everything, focus on the models and columns that feed core metrics or critical dashboards. It is better to have strong coverage on 20% of the warehouse that actually drives decisions than thin coverage everywhere.

Example schema.yml With Core Tests

version: 2

models:
  - name: dim_customers
    description: "Customer dimension for reporting."
    columns:
      - name: customer_id
        description: "Primary key for the customer."
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - not_null
      - name: country_code
        tests:
          - accepted_values:
              values: ['US','CA','GB','AU','DE']
      

Generic tests like unique, not_null, accepted_values, and relationships cover most structural assumptions. More complex logic, such as “churned customers cannot have recent invoices”, is best implemented as custom tests or unit tests.

dbt Materializations Best Practices

Choosing the right materialization has major impact on cost and performance. A simple but effective rule is: use views for most staging and intermediate models, and reserve tables or incremental models for marts that feed dashboards or heavy ad‑hoc querying.

Views keep storage usage low and always return the latest data, because they are resolved at query time. Tables cost more storage but avoid recomputing expensive logic repeatedly. Incremental models shine when dealing with very large facts that grow by appends, such as events or orders.

Layer‑Based Defaults

  • Staging: view
  • Intermediate: view (or ephemeral for small helper models)
  • Marts: table or incremental, depending on size and access patterns

dbt Snapshots Best Practices

Many warehouses need Slowly Changing Dimensions (SCD) to track how entity attributes change over time. dbt snapshots handle this pattern by periodically comparing current rows to stored versions and writing new records when selected columns change.

Instead of hand‑crafted MERGE statements, you declare how to detect changes, and dbt manages effective dates and history tables consistently across environments.

Example Snapshot for Customer Status

{% snapshot snap_customer_status %}

{{ config(
  target_schema='snapshots',
  unique_key='customer_id',
  strategy='check',
  check_cols=['status','tier']
) }}

SELECT
  customer_id,
  status,
  tier,
  updated_at
FROM {{ ref('stg_customers') }}

{% endsnapshot %}
      

This pattern is ideal for tracking account states, subscription plans, pricing tiers, and other business attributes where historical context matters for analytics.

dbt Naming Conventions Best Practices

Clear, consistent naming conventions make dbt projects navigable. Names should communicate both the layer and the business concept. Prefixes are particularly useful because they remain visible in compiled SQL, logs, and data catalogs.

Common Prefix Patterns

  • stg_ for staging models
  • int_ for intermediate models
  • dim_ for dimensions
  • fct_ or fact_ for fact tables
  • snap_ for snapshot definitions

Within each layer, aim for names that match how business stakeholders describe concepts, not just how source systems label them. This makes it easier to map analytics questions to the right models.

dbt Documentation Best Practices

Good documentation turns a dbt repository into a self‑serve data catalog. Descriptions on models and columns explain not just what a field is called, but how it should be interpreted and when it should be used.

Keeping documentation close to the code, inside schema.yml files, reduces drift. Running dbt docs generate and hosting the docs site gives analysts and stakeholders a visual map of the DAG and a searchable dictionary of metrics and entities.

Example Well‑Documented Model

models:
  - name: fct_orders
    description: "Order‑level fact table used for revenue and conversion reporting."
    columns:
      - name: order_id
        description: "Primary key for the order."
      - name: customer_id
        description: "Reference to dim_customers.customer_id."
      - name: order_date
        description: "Date the order was placed (UTC)."
      - name: gross_revenue
        description: "Total order value before discounts and refunds."
      

dbt Macros Best Practices

Whenever a piece of SQL logic is repeated in several models, it is a good candidate for a macro. Macros help keep code DRY, reduce copy‑paste bugs, and centralize complex expressions that are hard to maintain in many places.

Common macro use cases include date spines, surrogate key generation, conditional filters, common CTE patterns, or masking sensitive data. Start small: once the same pattern appears three times, promote it into a macro.

Example Surrogate Key Macro

-- macros/surrogate_key.sql
{% macro surrogate_key(columns) %}
  md5({{ columns | join(" || '|' || ") }})
{% endmacro %}
      

Using this macro in multiple models keeps your key strategy consistent and makes it trivial to change hashing behavior later if needed.

dbt CI/CD and Project Health Best Practices

As teams and projects grow, manual checks are not enough. Integrating dbt into a CI/CD pipeline ensures that every pull request runs compilation and targeted tests before changes reach production. This drastically reduces the risk of broken models or invalid data silently shipping to dashboards.

Many teams pair dbt with tools like SQLFluff for style and linting, plus project evaluators that enforce conventions around structure, naming, and dependencies. The goal is to turn your best practices into automated gates, not just tribal guidelines.

Example CI Steps

# Pseudocode for a CI pipeline
sqlfluff lint models/              # SQL style checks
dbt deps                           # Install packages
dbt compile                        # Validate DAG and refs
dbt test --select tag:critical     # Run critical tests only
      

Running the full test suite on a schedule (for example, nightly) and a smaller critical subset on every commit strikes a good balance between safety and feedback speed.

dbt Performance Best Practices

dbt performance scales with warehouse optimization techniques. Use incremental models for large fact tables (>1B rows), cluster_by on high-cardinality date columns, and ephemeral materialization for lightweight CTE helpers that don’t need persistence.

Filter early to reduce data volume passing through the DAG. Avoid SELECT * in upstream models. Monitor query costs with warehouse query history and prioritize incremental models where recompute time exceeds 30 minutes.

Production Incremental Model

{{ config(
  materialized='incremental',
  unique_key='order_id',
  incremental_strategy='merge'
) }}

SELECT * FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
  WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
  

dbt Packages Best Practices

Leverage community packages like dbt_utils and dbt_expectations for battle-tested utilities. Add them to packages.yml then run dbt deps. Common packages save 100+ lines of custom SQL per project.

packages.yml Production Template

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
  - package: calogica/dbt_expectations
    version: 0.10.3
  - package: dbt-labs/codegen
    version: 0.12.0
  

Use {{ dbt_utils.date_spine() }} for date tables, dbt_expectations.expect_column_values_to_be_of_type for advanced type validation, and dbt-codegen to auto-generate test boilerplate.

dbt Best Practices FAQ

This section answers common questions engineers ask when applying dbt best practices in real projects.

Project Structure & Layers

Should I use three or four layers in my dbt project?

Three layers (staging, intermediate, marts) are enough for most teams. A fourth “raw” or “bronze” layer can be useful when you mirror ingestion tables inside dbt, but many warehouses already handle that separation outside dbt. Start with three, then add a separate raw layer only if you have a clear need.

How small should a dbt model be?

A useful rule of thumb is that a model should comfortably fit on one screen and do one logical thing. If a file grows beyond 150–200 lines of SQL or mixes several types of logic, consider splitting it into multiple models so tests and ownership stay focused.

Testing & Data Quality

How many dbt tests are too many?

You do not need every generic test on every column. Focus on primary keys, foreign keys, and the columns that power core metrics. Add business‑rule tests where mistakes would be expensive, such as double‑counted revenue or invalid lifecycle states. Beyond that, additional tests yield diminishing returns relative to warehouse cost.

Should I use unit tests in dbt?

Unit tests are powerful for complex logic that cannot be expressed as a simple constraint. They let you define small input tables and expected outputs. They take more effort to set up, so reserve them for the models that implement the most critical or tricky business rules.

Performance & Materializations

When should I switch a view to a table in dbt?

Consider materializing as a table when a model is queried very frequently, has expensive joins or aggregations, or sits directly under important BI dashboards. Monitor query times and warehouse costs; if a single view is responsible for a large share of compute, turning it into a table or incremental model can pay off quickly.

What are dbt performance best practices?

Filter early to reduce data volume, avoid unnecessary SELECT *, keep staging models lean, and use incremental models for large append‑only facts. On platforms that support clustering or partitioning, align those settings with common filter patterns on date or key columns.

–>

Official dbt Documentation References

These official Microsoft dbt documentation pages complement this guide with additional technical details:

Scroll to Top