dbt Interview Questions
30+ dbt interview questions organized by topic. Focus on Models & Materializations and Testing — they come up in every analytics engineering interview.
Fundamentals
Q: What is dbt and how does it fit in the modern data stack?
Q: What is the difference between ref() and source()?
ref('model_name') references another dbt model. source('source_name', 'table_name') references a raw table from an external loader. Both build the DAG. source() also enables freshness monitoring. Never hardcode table names.Q: What is dbt Core vs dbt Cloud?
Q: What are seeds in dbt?
seeds/ directory that dbt loads as tables. Use for small, static lookup data: country codes, status mappings, category hierarchies. Run with dbt seed. Not for large datasets — those should come through your EL tool.Models & Materializations
Q: Explain the 4 materialization types. When do you use each?
Q: How do incremental models work? What happens on the first run?
is_incremental() block filters to only new/changed rows. The unique_key config determines upsert behavior (MERGE or DELETE+INSERT). Use --full-refresh to force a complete rebuild when logic changes.Q: What are the 3 model layers and their naming conventions?
stg_{source}__{table}): 1:1 with source, rename/cast/clean, materialized as view. Intermediate (int_{entity}__{verb}): joins/transforms, ephemeral or view. Marts (fct_ / dim_): business-ready, materialized as table. Marts organized by domain (finance/, marketing/).Q: What is an ephemeral model and when would you use one?
Deeper coverage: Model Layers Deep Dive
Testing & Quality
Q: What types of tests does dbt support?
tests/ directory that return rows that fail. Custom generic tests: Jinja macros that can be applied like built-in tests. dbt packages: dbt-expectations adds 50+ tests (e.g., expect_column_values_to_be_between).Q: How do you implement data quality checks in a dbt pipeline?
dbt source freshness). (4) dbt build runs tests after each model — failures block downstream. (5) dbt-expectations for statistical tests. (6) Elementary or re_data for anomaly detection.Q: What is source freshness and how do you configure it?
loaded_at_field (timestamp column) and freshness thresholds (warn_after, error_after). Run dbt source freshness to check. If the latest row is older than the threshold, it warns/errors. Essential for catching silent EL failures.Advanced Concepts
Q: What is a snapshot in dbt and when would you use one?
dbt_valid_from/dbt_valid_to columns. When a row changes, the old version gets an end date, and a new version is inserted. Use for tracking changes to dimensions over time (customer tier changes, product price history). Strategy: timestamp (needs an updated_at column) or check (compares column values).Q: How do Jinja macros work in dbt? Give an example.
macros/*.sql, called in models.
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(col) %}
({{ col }} / 100.0)::numeric(10,2)
{% endmacro %}
-- In a model:
SELECT {{ cents_to_dollars('amount_cents') }} AS amount
Also useful for environment-specific logic: {% if target.name == 'prod' %}.Q: How do you set up CI/CD for dbt?
dbt deps, (2) dbt build --select state:modified+ with a manifest from prod, (3) fail the PR if tests fail. Both approaches ensure untested code never reaches production.Q: What is the dbt semantic layer?
Scenario Questions
Q: Your incremental model is producing duplicate rows. How do you debug and fix it?
unique_key is set — without it, dbt appends instead of upserting. (2) Verify the incremental filter logic — is is_incremental() block correctly filtering to only new rows? (3) Check for late-arriving data that falls outside the filter window. (4) Run --full-refresh to rebuild cleanly. (5) Add a unique test on the primary key to catch duplicates early. (6) Consider using incremental_strategy='merge' with a proper unique_key.Q: How would you orchestrate dbt in production with Airflow?
dbt build (or split into dbt run + dbt test). (4) Use cosmos (astronomer-cosmos) to parse the dbt DAG and create one Airflow task per dbt model for fine-grained monitoring. (5) On failure: alert, don't retry dbt models automatically (they're idempotent but may mask data issues).Q: You inherit a dbt project with 200 models and no tests. How do you add data quality?
unique + not_null tests on all primary keys — use codegen package to auto-generate YAML. (3) Add relationships tests between staging and marts. (4) Add source freshness checks. (5) Add accepted_values on status/type columns. (6) Incrementally add singular tests for known business rules. (7) Integrate dbt build into CI to prevent regressions.Q: A daily dbt run takes 3 hours. How would you speed it up?
dbt run --profile or warehouse query history to find slow models. (2) Convert large full-table rebuilds to incremental. (3) Materialize staging as views (they're instant). (4) Use dbt run -s tag:critical for priority models. (5) Parallelize — increase threads in profiles.yml. (6) Optimize slow SQL (check EXPLAIN, add clustering keys). (7) Consider splitting into multiple runs (staging hourly, marts daily).