What is The Tuva Project?

Disclaimer: For informational purposes only. This content is not medical or legal advice.
TL;DR

The Tuva Project is an open-source dbt framework that transforms raw healthcare claims and clinical data into analytics-ready datasets. It gives you a complete healthcare data pipeline — connectors, data quality, normalization, a core data model, and 13+ pre-built data marts for risk adjustment, quality measures, readmissions, and more.

The Big Picture

Tuva takes the hardest part of healthcare analytics — turning raw, messy data into something you can actually analyze — and gives you an open-source, battle-tested pipeline that handles it all.

The Tuva Project pipeline: Raw Data → Connectors → Input Layer → Data Quality → Normalization → Core Data Model → Data Marts
Explain Like I'm 12

Imagine you have a huge messy box of hospital receipts, pharmacy slips, and doctor visit records — all in different formats. Tuva is like a magical sorting machine. You dump in the mess, it checks for errors, translates everything into the same language, organizes it into neat folders, and then automatically creates reports: Who's at risk? What's costing the most? Are patients getting the right care? It's built on dbt, so it's all SQL you can read and customize.

What is The Tuva Project?

The Tuva Project is an open-source healthcare data framework licensed under Apache 2.0, created by Tuva Health. At its core, it's a dbt package that runs inside your data warehouse — Snowflake, BigQuery, Redshift, or DuckDB.

You point it at your raw claims or clinical data, and it transforms that data into analytics-ready datasets. No custom pipelines, no proprietary tools, no black boxes. Just SQL you can read, test, and customize.

With 300+ GitHub stars and a growing community, Tuva is quickly becoming the standard open-source approach to healthcare data transformation.

Why Tuva Exists

Healthcare data is uniquely hard to work with. Here's why:

  • Every payer and EHR formats data differently — there's no universal standard for how claims or clinical records are structured.
  • Healthcare analytics requires deep domain expertise — risk adjustment, quality measures, and claims preprocessing involve complex business logic that takes years to learn.
  • Teams rebuild the same pipelines over and over — every health plan, ACO, and analytics vendor writes similar SQL for similar problems.

Tuva codifies this domain expertise into reusable, tested, open-source SQL. Instead of every team reinventing the wheel, you inherit years of healthcare data engineering knowledge as a dbt package.

Who is it for?

Tuva is built for anyone working with healthcare data on a modern data stack:

  • Healthcare data engineers building data pipelines for claims and clinical data
  • Analytics engineers using dbt to model healthcare data
  • BI developers creating dashboards for health plan performance
  • Health plan analysts working on risk adjustment, quality measures, or cost analytics
  • ACO data teams tracking quality metrics and shared savings
  • Anyone building healthcare analytics on Snowflake, BigQuery, Redshift, or DuckDB

The Pipeline at a Glance

Tuva processes healthcare data through a 7-step pipeline. Each step builds on the previous one:

1
Connectors
Map your raw source data into Tuva's expected format
2
Input Layer
Standardized tables that the rest of Tuva reads from
3
Data Quality
100+ automated checks on completeness, validity, and consistency
4
Normalization
Standardize codes to ICD-10, CPT, SNOMED, and other terminologies
5
Claims Preprocessing
Group claim lines into encounters and assign service categories
6
Core Data Model
Unified patient-centric schema: patient, condition, encounter, medication
7
Data Marts
13+ pre-built analytics: risk adjustment, HEDIS, readmissions, PMPM

Key Features

🔌
Connectors
Map raw data from any source into Tuva's standardized Input Layer format
Data Quality
100+ automated checks validate completeness, validity, and consistency
🔄
Normalization
Standardize codes to ICD-10, CPT, SNOMED, and other terminologies
🏗️
Core Data Model
Unified patient-centric schema with patient, condition, encounter, and medication tables
📊
13+ Data Marts
Pre-built analytics for CMS-HCC risk adjustment, HEDIS, readmissions, PMPM, and more
📚
Terminology
Ships with ICD-10, CPT, SNOMED, HCPCS, NDC codes and CMS value sets

Supported Warehouses

Tuva runs wherever your data lives. It supports the major cloud data warehouses and a local development option:

  • Snowflake — Full production support
  • BigQuery — Full production support
  • Redshift — Full production support
  • DuckDB — Local development and testing

Because Tuva is a dbt package, it compiles to native SQL for your warehouse. No additional infrastructure needed — it runs inside your existing dbt project.

What You'll Learn

This topic walks you through The Tuva Project from big picture to deep implementation details:

Start Learning: Core Concepts →

Test Yourself

What type of tool is Tuva (framework, SaaS, API)?

Tuva is an open-source dbt framework (a dbt package). It's not a SaaS product or an API — it's SQL code that runs inside your own data warehouse. You install it as a dbt dependency and run it with dbt build.

What are the 7 stages of the Tuva pipeline?

The 7 stages are: (1) Connectors — map raw data to standard format, (2) Input Layer — standardized tables, (3) Data Quality — 100+ automated checks, (4) Normalization — standardize codes, (5) Claims Preprocessing — group claims into encounters, (6) Core Data Model — unified patient-centric schema, (7) Data Marts — 13+ pre-built analytics.

What data warehouses does Tuva support?

Tuva supports Snowflake, BigQuery, and Redshift for production use, and DuckDB for local development and testing. Because it's a dbt package, it compiles to native SQL for each warehouse.

Why is healthcare data particularly hard to work with?

Healthcare data is hard because: (1) every payer and EHR formats data differently — there's no universal standard, (2) healthcare analytics requires deep domain expertise in areas like risk adjustment, quality measures, and claims preprocessing, and (3) teams end up rebuilding the same complex pipelines over and over. Tuva exists to solve all three problems.