dbt Model Layers

TL;DR

Organize dbt models into 3 layers: Staging (1:1 with sources, rename/cast/clean), Intermediate (join/transform between staging and marts), and Marts (business-ready tables for end users). This is the dbt-recommended structure used by most teams.

Explain Like I'm 12

Think of a restaurant kitchen. Staging = the prep station where you wash vegetables, peel potatoes, dice onions. One station per ingredient, output is clean and consistent. Intermediate = the cooking station where you combine prepped ingredients into sauces, fillings, and bases. Marts = the plated dishes served to customers. Each dish (mart) is designed for a specific customer (business team).

The 3 Layers

1
Staging (stg_)
1:1 with source tables. Rename, cast, clean.
2
Intermediate (int_)
Join, aggregate, transform. Bridge staging to marts.
3
Marts (fct_ / dim_)
Business-ready. Consumed by BI tools and analysts.

Staging Layer

Purpose: Create a clean, consistent version of each raw source table. One staging model per source table.

RuleExample
Prefix: stg_stg_stripe__payments.sql
1:1 with sourceOne stg_ model per raw table
Rename columnsidpayment_id
Cast typesCAST(amount AS numeric)
Filter deleted rowsWHERE _fivetran_deleted = false
Materializationview (lightweight, always fresh)
Only ref'd byIntermediate and mart models
-- models/staging/stripe/stg_stripe__payments.sql
WITH source AS (
    SELECT * FROM {{ source('stripe', 'payments') }}
),

renamed AS (
    SELECT
        id AS payment_id,
        order_id,
        amount::numeric(10,2) AS amount,
        currency,
        status,
        created_at::timestamp AS created_at
    FROM source
    WHERE _fivetran_deleted = false
)

SELECT * FROM renamed

Intermediate Layer

Purpose: Complex transformations that bridge staging to marts. Joins, aggregations, business logic that doesn't belong in staging or marts alone.

RuleExample
Prefix: int_int_orders__pivoted_by_status.sql
Only ref staging/other int modelsNever ref raw source tables
Materializationephemeral or view (not exposed to end users)
Not queried by BI toolsInternal to dbt project only
-- models/intermediate/int_orders__joined.sql
SELECT
    o.order_id,
    o.customer_id,
    c.customer_name,
    c.customer_segment,
    o.order_date,
    o.status,
    p.amount AS payment_amount,
    p.currency
FROM {{ ref('stg_shopify__orders') }} o
LEFT JOIN {{ ref('stg_shopify__customers') }} c ON o.customer_id = c.customer_id
LEFT JOIN {{ ref('stg_stripe__payments') }} p ON o.order_id = p.order_id
When to create intermediate models: (1) A join is used by 2+ mart models (DRY). (2) A mart query is getting too long/complex. (3) You need to pivot, window-function, or aggregate before the final mart.

Marts Layer

Purpose: Business-ready tables consumed by BI tools, analysts, and other consumers. Organized by business domain (finance, marketing, product).

TypePrefixWhat It Contains
Factfct_Events/transactions: orders, clicks, payments. Grain = one row per event.
Dimensiondim_Entities/attributes: customers, products, regions. Grain = one row per entity.
Metricmrt_ or rpt_Pre-aggregated report tables: daily_revenue, weekly_retention.
-- models/marts/finance/fct_orders.sql
{{
  config(materialized='table')
}}

SELECT
    order_id,
    customer_id,
    customer_name,
    customer_segment,
    order_date,
    status,
    payment_amount,
    currency,
    CASE WHEN status = 'returned' THEN -payment_amount ELSE payment_amount END AS net_amount
FROM {{ ref('int_orders__joined') }}
Directory structure:
models/
├── staging/
│   ├── stripe/
│   │   ├── _stripe__sources.yml
│   │   └── stg_stripe__payments.sql
│   └── shopify/
│       ├── _shopify__sources.yml
│       ├── stg_shopify__orders.sql
│       └── stg_shopify__customers.sql
├── intermediate/
│   └── int_orders__joined.sql
└── marts/
    ├── finance/
    │   ├── fct_orders.sql
    │   └── dim_customers.sql
    └── marketing/
        └── fct_campaigns.sql

Naming Conventions

PatternExampleWhen
stg_{source}__{table}stg_stripe__paymentsAll staging models
int_{entity}__{verb}int_orders__pivoted_by_statusIntermediate models
fct_{event/process}fct_ordersFact tables (events)
dim_{entity}dim_customersDimension tables (entities)
Double underscore (__) separates source from table in staging names. Single underscore separates words. This makes it instantly clear where data comes from.

Test Yourself

Q: What belongs in a staging model?

Renaming columns, casting types, filtering deleted rows. One model per source table. No joins, no business logic. Materialized as views. The goal is a clean, consistent interface to raw data.

Q: When should you create an intermediate model?

When a join/transformation is used by 2+ mart models (DRY principle), when a mart query is too complex, or when you need aggregations/window functions before the final mart.

Q: What is the difference between fct_ and dim_ models?

fct_ = fact tables with events/transactions (orders, clicks). One row per event. Contain measures (amounts, counts). dim_ = dimension tables with entities/attributes (customers, products). One row per entity. Contain descriptive attributes used for filtering and grouping.

Interview Questions

Q: Walk me through how you'd structure a dbt project for an e-commerce company with Stripe (payments) and Shopify (orders, customers) as data sources.

Staging: stg_stripe__payments, stg_shopify__orders, stg_shopify__customers — rename, cast, filter each source. Views.
Intermediate: int_orders__joined (join orders + customers + payments). Ephemeral or view.
Marts: fct_orders (one row per order with all details), dim_customers (one row per customer with lifetime metrics), fct_daily_revenue (aggregated by day). Tables.
Tests: unique + not_null on all primary keys, relationships between staging models, accepted_values on status fields, custom test for negative amounts.

Q: Why materialize staging models as views and marts as tables?

Staging views are lightweight (no storage cost), always reflect latest source data, and build instantly. Marts as tables are pre-computed — BI tools query them directly and need fast response times. If staging models were tables, you'd double your storage and rebuild time for simple rename/cast operations.

Q: How do you decide between an incremental model and a full table rebuild?

Incremental when: (1) table is large (millions+ rows), (2) you can reliably identify new/changed rows (timestamp or ID), (3) full rebuild takes too long. Full table when: (1) data is small, (2) logic is complex and hard to make incremental, (3) source data can be updated retroactively (backfills). Default to table until rebuild time becomes a problem.