The Tuva Project Interview Questions
25+ interview questions about The Tuva Project with hidden answers, organized by topic. Click "Show Answer" to reveal. Covers architecture, data marts, risk adjustment, quality measures, terminology, dbt integration, and real-world healthcare analytics scenarios.
Tuva Fundamentals
Q: What is The Tuva Project and what problem does it solve?
Q: What is the Input Layer in Tuva?
Q: How do Tuva Connectors work?
Q: What data warehouses does Tuva support?
Q: Why is Tuva built on dbt rather than a custom framework?
ref() function ensures models run in the correct order across the 7-stage pipeline, (3) Testing — dbt's built-in testing framework validates data quality at every stage (not null, unique, accepted values, relationships), (4) Seed files — dbt seeds are perfect for loading terminology data (ICD-10 codes, HCC mappings), (5) Package management — Tuva can be installed as a dbt package via packages.yml, making version management trivial, (6) Community — the large dbt community means more contributors, faster bug fixes, and easier hiring.Want deeper coverage? See Tuva Project Overview and Core Concepts.
Architecture & Pipeline
Q: Walk through Tuva's 7-stage pipeline from raw data to analytics output.
1. Input Layer: Raw healthcare data mapped to Tuva's standardized schema (medical claims, pharmacy claims, eligibility, providers).
2. Connectors: Pre-built transformations that map specific source systems (Medicare LDS, Athena, etc.) into the Input Layer format.
3. Staging: Initial data cleaning, type casting, and basic validation of Input Layer data.
4. Normalization: Maps source-specific codes to standard terminologies (ICD-10, CPT, etc.) and standardizes formats across all sources.
5. Claims Preprocessing: Healthcare-specific logic — claim grouping, service categorization, encounter assignment, duplicate resolution.
6. Core Data Model: The clean, standardized analytical foundation with tables for conditions, encounters, eligibility spans, procedures, and medications.
7. Data Marts: 13+ pre-built analytics modules (CMS-HCC, Quality Measures, Readmissions, PMPM, etc.) that produce analytics-ready output tables.
Q: What is the difference between Normalization and Claims Preprocessing?
Claims Preprocessing focuses on healthcare-specific business logic — it applies clinical and billing rules that require domain knowledge. This includes: grouping claim lines into complete claims, assigning service categories (inpatient, outpatient, ED, professional), identifying encounters from individual claims, resolving duplicate or overlapping claims, and applying claim type hierarchies. Normalization makes the data consistent; Claims Preprocessing makes it analytically meaningful.
Q: What does the Core Data Model contain?
• condition: Patient conditions with standardized ICD-10 codes, onset dates, and status
• encounter: Clinical encounters (inpatient stays, ED visits, office visits) with dates, types, and providers
• eligibility: Member enrollment spans with plan details and coverage dates
• medical_claim: Standardized medical claims with normalized codes and amounts
• pharmacy_claim: Standardized pharmacy claims with NDC codes and costs
• procedure: Procedures performed with CPT/HCPCS codes
• lab_result: Laboratory test results with LOINC codes
• medication: Medication records with RxNorm codes
The Core Model is source-agnostic — regardless of whether data came from Medicare, a commercial payer, or an EHR, it all conforms to the same schema.
Q: How do data marts relate to the Core Data Model?
Q: How does Tuva handle data quality?
Deeper coverage: Core Concepts
Data Marts & Analytics
Q: What is CMS-HCC and how does Tuva implement it?
Q: Explain PMPM and how Tuva's Financial PMPM mart calculates it.
Q: How does the readmissions mart work?
Q: What is the difference between HCC suspecting and HCC recapture?
HCC Suspecting identifies conditions that are likely present but never documented on claims. It uses indirect evidence: a patient on insulin (from pharmacy claims) without a diabetes diagnosis on medical claims, or abnormal lab values without a corresponding condition code. These are net-new HCC opportunities — conditions that have never been coded.
HCC Recapture tracks conditions that were documented in prior years but haven't been re-documented this year. Since CMS requires chronic conditions to be documented annually for risk adjustment, any condition that "drops off" means lost RAF score. These are documentation gaps — the condition still exists, it just hasn't been coded yet this year.
Together, they form a complete revenue integrity strategy: suspecting finds new money, recapture prevents losing existing money.
Q: Name 5 data marts in Tuva and explain their use cases.
2. Quality Measures (HEDIS): Calculates HEDIS-style quality measures, identifies care gaps, and tracks compliance rates. Feeds directly into CMS STAR rating improvement efforts.
3. Financial PMPM: Calculates Per-Member-Per-Month costs broken down by service category, provider, and time period. The foundation of healthcare financial reporting.
4. Chronic Conditions: Identifies and groups patients by chronic conditions using CMS CCW definitions. Powers population health segmentation and care management targeting.
5. Readmissions: Flags 30-day all-cause readmissions and classifies them as planned vs. unplanned. Used for CMS penalty avoidance and quality improvement initiatives.
Deeper coverage: Data Marts & Analytics
Terminology & Value Sets
Q: How does Tuva manage healthcare terminology?
dbt seed, these files are automatically downloaded and loaded into your data warehouse. The version is pinned in dbt_project.yml, ensuring reproducibility. This replaces the traditional approach of manually downloading code files from CMS websites, parsing them, and building lookup tables — a process that typically took weeks of engineering time per update.Q: What is the difference between code systems and value sets in Tuva?
Value sets are curated subsets of codes grouped for specific analytics purposes. The CMS-HCC value set maps specific ICD-10 codes to HCC categories. CCSR value sets group ICD-10 codes into ~530 clinically meaningful categories. Quality measure value sets define which codes constitute the eligible population and numerator criteria for each HEDIS measure. They answer: "which codes belong to this concept?"
Every data mart depends on value sets. Without the HCC mapping value set, the CMS-HCC mart can't calculate RAF scores. Without measure specification value sets, the Quality Measures mart can't identify care gaps.
Q: How does Tuva's normalization process use terminology to standardize data?
Q: How are terminology updates handled in Tuva?
tuva_terminology_version variable in dbt_project.yml to the new version number, (2) Run dbt seed to download and load the updated terminology files, (3) Run dbt build to recalculate all data marts with the new terminology. Version pinning ensures reproducibility — the same version number always produces the same terminology data. This is critical for regulatory compliance: if CMS audits risk adjustment submissions, you need to prove which terminology version (and therefore which HCC crosswalk) was used. You can also run multiple versions in parallel to compare impact (e.g., "how would RAF scores change under the new CMS-HCC model version?").Deeper coverage: Terminology & Value Sets
dbt Integration & Practical
Q: How do you install and configure Tuva in a dbt project?
packages.yml file with the desired version, (2) Run dbt deps to download the package, (3) Configure your dbt_project.yml with Tuva-specific variables — database/schema settings, which data marts to enable, terminology version, and source table references, (4) Build or configure your connector to map your source data to Tuva's Input Layer, (5) Run dbt seed to load terminology data, (6) Run dbt build to execute the entire pipeline. The most time-consuming step is typically building the connector (step 4) if a pre-built one doesn't exist for your source system. Everything else is configuration.Q: What dbt variables control which data marts run in Tuva?
dbt_project.yml to control mart execution. There's a master switch: tuva_marts_enabled: true enables all marts at once. For granular control, individual variables exist for each mart: cms_hcc_enabled, quality_measures_enabled, readmissions_enabled, financial_pmpm_enabled, chronic_conditions_enabled, ed_classification_enabled, etc. Setting any variable to true activates that mart; false (or omitting it) skips it. This is useful because not every organization needs every mart. A commercial health plan might enable CMS-HCC and Quality Measures but skip AHRQ measures. A hospital system might focus on Readmissions and ED Classification.Q: How would you build a custom Connector for a new data source?
Q: How does Tuva's architecture leverage dbt features like refs, tests, and seeds?
ref(): Every model in the 7-stage pipeline uses
ref() to reference upstream models, creating a dependency graph that ensures correct execution order. This is what makes the pipeline work — the CMS-HCC mart automatically knows it depends on the Core Model, which depends on Claims Preprocessing, and so on.Tests: Tuva includes hundreds of built-in tests across the pipeline:
not_null on critical fields, unique on primary keys, accepted_values on categorical fields (e.g., gender must be 'male' or 'female'), and relationships tests that validate foreign keys between tables. These run automatically during dbt build.Seeds: All terminology data (ICD-10 codes, HCC mappings, value sets) is loaded via
dbt seed. This makes terminology version-controlled, portable across warehouses, and easy to update.Macros: Tuva uses dbt macros for reusable SQL logic (e.g., date calculations, code validation functions) and for warehouse-specific SQL generation (handling Snowflake vs. BigQuery syntax differences).
Scenario Questions
Q: A health plan wants to improve their CMS STAR rating. How would you use Tuva to identify gaps in care?
1. Enable the Quality Measures mart: Set
quality_measures_enabled: true and run dbt build. This calculates HEDIS-style measures across the member population.2. Identify measures closest to STAR thresholds: Query the summary tables to find measures where your compliance rate is just below the next STAR cut-point. A measure at 71% where 72% earns an extra star is your highest-ROI target.
3. Generate member-level care gap lists: For priority measures, pull the list of members in the denominator (eligible) who are not in the numerator (measure not met). These are your actionable care gaps.
4. Segment by provider: Cross-reference care gaps with the Provider Attribution mart to identify which medical groups have the most open gaps. Target provider education and incentives there.
5. Cross-reference with Chronic Conditions: Members with multiple chronic conditions often have more care gaps. Use the Chronic Conditions mart to prioritize outreach to the highest-risk members.
6. Track over time: Run the Quality Measures mart monthly to track gap closure rates and forecast year-end STAR performance.
Q: Your risk adjustment team suspects undercoding. How would you use Tuva's HCC suspecting mart?
dbt build.2. Review the suspecting list: The mart produces a list of members with conditions suggested by indirect evidence but not documented on medical claims. Key signals include:
• Pharmacy-to-diagnosis gaps: Member is on insulin but has no diabetes diagnosis code
• Lab-to-diagnosis gaps: Abnormal HbA1c values without a diabetes diagnosis
• Historical conditions: Chronic conditions documented in prior years but missing from current claims (this overlaps with recapture)
3. Prioritize by RAF impact: Not all suspected HCCs are equal. Sort by the HCC coefficient to focus on conditions that would add the most to RAF scores. HCC 85 (CHF) has a much higher coefficient than HCC 19 (Diabetes without complication).
4. Generate provider worklists: Group suspected conditions by rendering provider. Send targeted lists to coding teams and providers for chart review and documentation improvement.
5. Quantify revenue impact: Calculate: (suspected RAF increase) x (number of affected members) x (monthly capitation rate) x 12 = annual revenue opportunity.
6. Compliance guardrail: Suspected conditions must be confirmed through legitimate chart review. The suspecting mart identifies opportunities — it does NOT justify coding without clinical documentation.
Q: You need to build a financial dashboard showing PMPM trends. What Tuva tables would you use?
pmpm_prep and pmpm_payer_plan.Dashboard components:
• Total PMPM trend line: Monthly total PMPM from
pmpm_payer_plan, showing medical + pharmacy combined. Add year-over-year comparison.• Service category breakdown: Stacked bar/area chart showing inpatient, outpatient, professional, ED, and pharmacy PMPM components. Identifies which categories drive cost changes.
• Medical vs. pharmacy split: Two trend lines showing how the mix between medical and pharmacy spend is shifting.
• Top cost drivers: Join PMPM data with the CCSR or Chronic Conditions mart to show which conditions or clinical categories drive the most cost.
• Provider-level analysis: PMPM by provider group, highlighting outliers (providers significantly above or below peers).
Supporting context from other marts: Chronic Conditions mart for disease prevalence context, CMS-HCC mart for risk-adjusted comparisons (a plan with sicker members should have higher PMPM), and eligibility data for member month calculations and population mix analysis.
Q: You're migrating from a custom analytics pipeline to Tuva. What's your approach?
Phase 1 — Assessment (2 weeks): Document the current pipeline: what data sources feed it, what transformations it applies, what outputs it produces. Map each existing output to a Tuva equivalent. Identify any custom analytics that Tuva doesn't cover (you'll need to build these as extensions).
Phase 2 — Parallel build (4-6 weeks): Install Tuva alongside the existing pipeline. Build or configure connectors for your data sources. Enable the data marts that correspond to your current outputs. Run both pipelines in parallel.
Phase 3 — Validation (2-4 weeks): Compare outputs between the old pipeline and Tuva. Discrepancies will exist — investigate each one. Common causes: different claim grouping logic, different code mapping versions, different business rules for edge cases. Document and resolve each difference. Get sign-off from analytics consumers that Tuva outputs are acceptable.
Phase 4 — Cutover (1-2 weeks): Redirect BI tools and downstream consumers to Tuva output tables. Keep the old pipeline available (read-only) for 30 days as a safety net. Decommission the old pipeline after the grace period.
Key risks: The custom pipeline likely has undocumented business rules and edge case handling. Expect the validation phase to take longer than planned. Involve domain experts (actuaries, quality analysts) in the comparison to catch subtle differences.