dbt Best Practices: 10 Rules for Scalable, Modular SQL Transformation
dbt Best Practices – dbt (data build tool) brings software engineering discipline to analytics engineering. It lets data teams transform raw warehouse tables into clean, wellâtested models using SQL plus version control, automated tests, documentation, and CI/CD. As a result, dbt has become the default standard for modern data teams working on Snowflake, BigQuery, Redshift, Databricks, Fabric, and Postgres.
However, tools alone do not guarantee reliability. Without clear dbt best practices, projects can still turn into fragile, untested SQL jungles. This guide walks through 10 practical rules used by experienced teams to structure dbt projects, keep SQL modular, enforce data quality, and ship fast without breaking dashboards.
dbt Project Structure Best Practices
The most important dbt best practice is to separate models into clear layers. A threeâlayer structureâstaging, intermediate, and martsâkeeps transformations understandable and debuggable as projects grow. Each layer has a single responsibility so changes in one place do not cascade unpredictably through the rest of the DAG.
Use folders and naming conventions together so anyone opening the repo can instantly see where raw source cleanup happens, where business logic lives, and which models are safe to connect BI tools to.
Recommended 3âLayer Layout
models/
01_staging/
stg_customers.sql
stg_orders.sql
02_intermediate/
int_order_items.sql
int_customer_ltv.sql
03_marts/
dim_customers.sql
fct_orders.sql
Numbered folders keep execution order obvious, while prefixes like stg_, int_, dim_, and fct_ make the modelâs role clear even when viewed out of context, such as in compiled SQL or logs.
dbt ref() Best Practices
Hardâcoding schema and table names is the fastest way to break a dbt DAG. Instead, always reference other models with ref(). dbt uses these calls to build the dependency graph, validate that upstream models exist, and swap schemas between development and production targets without changes to SQL.
Always Prefer ref() Over Raw Table Names
-- â Antiâpattern
SELECT *
FROM analytics_staging.stg_orders;
-- â
Best practice
SELECT *
FROM {{ ref('stg_orders') }};
With this approach, renaming schemas, cloning environments, or reorganizing models becomes mostly configuration work instead of a dangerous searchâandâreplace across dozens of SQL files.
dbt Staging Layer Best Practices
Staging models should be boring on purpose. They read directly from sources, standardize column names, fix data types, and apply very light cleaning such as trimming whitespace or converting cents to dollars. They should not contain joins, aggregations, or complicated business rules.
By keeping staging models simple and oneâtoâone with upstream tables, you create a stable contract between raw ingestion and the rest of your dbt project. When a source adds a column or changes a data type, only a single staging file needs to be updated.
Example Simple Staging Model
{{ config(materialized='view') }}
SELECT
id::int AS customer_id,
email::varchar AS customer_email,
created_at::timestamp AS created_at_utc,
status::varchar AS customer_status
FROM {{ source('raw', 'customers') }};
dbt Testing Best Practices
Testing is one of the biggest reasons to adopt dbt. At minimum, every important model should enforce uniqueness and nonânull constraints on primary keys, plus referential integrity on foreign keys. Beyond that, targeted businessârule tests catch issues that simple schema checks cannot see.
Instead of trying to test everything, focus on the models and columns that feed core metrics or critical dashboards. It is better to have strong coverage on 20% of the warehouse that actually drives decisions than thin coverage everywhere.
Example schema.yml With Core Tests
version: 2
models:
- name: dim_customers
description: "Customer dimension for reporting."
columns:
- name: customer_id
description: "Primary key for the customer."
tests:
- unique
- not_null
- name: email
tests:
- not_null
- name: country_code
tests:
- accepted_values:
values: ['US','CA','GB','AU','DE']
Generic tests like unique, not_null, accepted_values, and relationships cover most structural assumptions. More complex logic, such as âchurned customers cannot have recent invoicesâ, is best implemented as custom tests or unit tests.
dbt Materializations Best Practices
Choosing the right materialization has major impact on cost and performance. A simple but effective rule is: use views for most staging and intermediate models, and reserve tables or incremental models for marts that feed dashboards or heavy adâhoc querying.
Views keep storage usage low and always return the latest data, because they are resolved at query time. Tables cost more storage but avoid recomputing expensive logic repeatedly. Incremental models shine when dealing with very large facts that grow by appends, such as events or orders.
LayerâBased Defaults
- Staging: view
- Intermediate: view (or ephemeral for small helper models)
- Marts: table or incremental, depending on size and access patterns
dbt Snapshots Best Practices
Many warehouses need Slowly Changing Dimensions (SCD) to track how entity attributes change over time. dbt snapshots handle this pattern by periodically comparing current rows to stored versions and writing new records when selected columns change.
Instead of handâcrafted MERGE statements, you declare how to detect changes, and dbt manages effective dates and history tables consistently across environments.
Example Snapshot for Customer Status
{% snapshot snap_customer_status %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['status','tier']
) }}
SELECT
customer_id,
status,
tier,
updated_at
FROM {{ ref('stg_customers') }}
{% endsnapshot %}
This pattern is ideal for tracking account states, subscription plans, pricing tiers, and other business attributes where historical context matters for analytics.
dbt Naming Conventions Best Practices
Clear, consistent naming conventions make dbt projects navigable. Names should communicate both the layer and the business concept. Prefixes are particularly useful because they remain visible in compiled SQL, logs, and data catalogs.
Common Prefix Patterns
stg_for staging modelsint_for intermediate modelsdim_for dimensionsfct_orfact_for fact tablessnap_for snapshot definitions
Within each layer, aim for names that match how business stakeholders describe concepts, not just how source systems label them. This makes it easier to map analytics questions to the right models.
dbt Documentation Best Practices
Good documentation turns a dbt repository into a selfâserve data catalog. Descriptions on models and columns explain not just what a field is called, but how it should be interpreted and when it should be used.
Keeping documentation close to the code, inside schema.yml files, reduces drift. Running dbt docs generate and hosting the docs site gives analysts and stakeholders a visual map of the DAG and a searchable dictionary of metrics and entities.
Example WellâDocumented Model
models:
- name: fct_orders
description: "Orderâlevel fact table used for revenue and conversion reporting."
columns:
- name: order_id
description: "Primary key for the order."
- name: customer_id
description: "Reference to dim_customers.customer_id."
- name: order_date
description: "Date the order was placed (UTC)."
- name: gross_revenue
description: "Total order value before discounts and refunds."
dbt Macros Best Practices
Whenever a piece of SQL logic is repeated in several models, it is a good candidate for a macro. Macros help keep code DRY, reduce copyâpaste bugs, and centralize complex expressions that are hard to maintain in many places.
Common macro use cases include date spines, surrogate key generation, conditional filters, common CTE patterns, or masking sensitive data. Start small: once the same pattern appears three times, promote it into a macro.
Example Surrogate Key Macro
-- macros/surrogate_key.sql
{% macro surrogate_key(columns) %}
md5({{ columns | join(" || '|' || ") }})
{% endmacro %}
Using this macro in multiple models keeps your key strategy consistent and makes it trivial to change hashing behavior later if needed.
dbt CI/CD and Project Health Best Practices
As teams and projects grow, manual checks are not enough. Integrating dbt into a CI/CD pipeline ensures that every pull request runs compilation and targeted tests before changes reach production. This drastically reduces the risk of broken models or invalid data silently shipping to dashboards.
Many teams pair dbt with tools like SQLFluff for style and linting, plus project evaluators that enforce conventions around structure, naming, and dependencies. The goal is to turn your best practices into automated gates, not just tribal guidelines.
Example CI Steps
# Pseudocode for a CI pipeline
sqlfluff lint models/ # SQL style checks
dbt deps # Install packages
dbt compile # Validate DAG and refs
dbt test --select tag:critical # Run critical tests only
Running the full test suite on a schedule (for example, nightly) and a smaller critical subset on every commit strikes a good balance between safety and feedback speed.
dbt Performance Best Practices
dbt performance scales with warehouse optimization techniques. Use incremental models for large fact tables (>1B rows), cluster_by on high-cardinality date columns, and ephemeral materialization for lightweight CTE helpers that don’t need persistence.
Filter early to reduce data volume passing through the DAG. Avoid SELECT * in upstream models. Monitor query costs with warehouse query history and prioritize incremental models where recompute time exceeds 30 minutes.
Production Incremental Model
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
) }}
SELECT * FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
dbt Packages Best Practices
Leverage community packages like dbt_utils and dbt_expectations for battle-tested utilities. Add them to packages.yml then run dbt deps. Common packages save 100+ lines of custom SQL per project.
packages.yml Production Template
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: calogica/dbt_expectations
version: 0.10.3
- package: dbt-labs/codegen
version: 0.12.0
Use {{ dbt_utils.date_spine() }} for date tables, dbt_expectations.expect_column_values_to_be_of_type for advanced type validation, and dbt-codegen to auto-generate test boilerplate.
dbt Best Practices FAQ
This section answers common questions engineers ask when applying dbt best practices in real projects.
Project Structure & Layers
Should I use three or four layers in my dbt project?
Three layers (staging, intermediate, marts) are enough for most teams. A fourth ârawâ or âbronzeâ layer can be useful when you mirror ingestion tables inside dbt, but many warehouses already handle that separation outside dbt. Start with three, then add a separate raw layer only if you have a clear need.
How small should a dbt model be?
A useful rule of thumb is that a model should comfortably fit on one screen and do one logical thing. If a file grows beyond 150â200 lines of SQL or mixes several types of logic, consider splitting it into multiple models so tests and ownership stay focused.
Testing & Data Quality
How many dbt tests are too many?
You do not need every generic test on every column. Focus on primary keys, foreign keys, and the columns that power core metrics. Add businessârule tests where mistakes would be expensive, such as doubleâcounted revenue or invalid lifecycle states. Beyond that, additional tests yield diminishing returns relative to warehouse cost.
Should I use unit tests in dbt?
Unit tests are powerful for complex logic that cannot be expressed as a simple constraint. They let you define small input tables and expected outputs. They take more effort to set up, so reserve them for the models that implement the most critical or tricky business rules.
Performance & Materializations
When should I switch a view to a table in dbt?
Consider materializing as a table when a model is queried very frequently, has expensive joins or aggregations, or sits directly under important BI dashboards. Monitor query times and warehouse costs; if a single view is responsible for a large share of compute, turning it into a table or incremental model can pay off quickly.
What are dbt performance best practices?
Filter early to reduce data volume, avoid unnecessary SELECT *, keep staging models lean, and use incremental models for large appendâonly facts. On platforms that support clustering or partitioning, align those settings with common filter patterns on date or key columns.
Official dbt Documentation References
These official Microsoft dbt documentation pages complement this guide with additional technical details:



