Microsoft Fabric Interview Questions

TL;DR

30+ Microsoft Fabric interview questions organized by topic. Click “Show Answer” to reveal each answer. Covers architecture & OneLake, lakehouse vs warehouse, Direct Lake & Power BI, data pipelines, governance, and scenario-based questions.

Short on time? Focus on Architecture & OneLake and Lakehouse vs Warehouse first — these come up in almost every Fabric interview. Then hit Scenario Questions for hands-on design problems.

Architecture & OneLake

Q: What is Microsoft Fabric and how does it differ from Azure Synapse Analytics?

Fabric is a unified SaaS analytics platform that integrates data engineering, warehousing, data science, real-time analytics, and Power BI under one product with one storage (OneLake), one security model, and one billing model (capacity units). Synapse was one component in a fragmented Azure analytics ecosystem that required separate services for Power BI, ADF, ADLS, etc.

Q: What is OneLake? How is it different from Azure Data Lake Storage (ADLS) Gen2?

OneLake is Fabric’s built-in, tenant-wide storage layer built on top of ADLS Gen2. The difference: you don’t create or manage storage accounts — every Fabric tenant gets one OneLake automatically. All Fabric items store data as Delta Parquet in OneLake. It’s “OneDrive for data.”

Q: What storage format does Fabric use internally, and why?

Delta Parquet (Parquet data files + Delta transaction log). This gives ACID transactions, time travel, and schema enforcement. Because all engines (Spark, T-SQL, Power BI) read the same Delta format, there’s no data duplication between systems.

Q: What are OneLake shortcuts and when would you use them?

Shortcuts are pointers to external data (ADLS Gen2, S3, GCS, Dataverse, or another OneLake location) that appear as folders in a lakehouse. The data isn’t copied. Use them when you want to query existing external data without migration, share data across workspaces, or create a logical federation layer.

Q: What are the main “experiences” in Microsoft Fabric?

Seven experiences: (1) Data Engineering (Spark, lakehouses), (2) Data Warehouse (T-SQL), (3) Data Science (ML notebooks, MLflow), (4) Data Factory (pipelines + Dataflows Gen2), (5) Real-Time Intelligence (Eventstreams, KQL), (6) Power BI (reports + dashboards), (7) Data Activator (trigger actions on data conditions).

Q: How does Fabric handle billing? What are Capacity Units (CUs)?

Fabric uses a single capacity-based billing model. You purchase a capacity SKU (F2, F4, F8...F2048) measured in CUs. All Fabric workloads (Spark, SQL, Power BI, pipelines) share the same capacity pool. Operations consume CUs, and the capacity auto-scales within the SKU limits. No per-activity or per-query billing.
Learn more: Core Concepts covers OneLake, workspaces, and capacities in detail.

Lakehouse vs Warehouse

Q: What is the difference between a Fabric lakehouse and a Fabric warehouse?

Both store data as Delta Parquet in OneLake. The lakehouse supports Spark (PySpark, Spark SQL) for reads/writes and has a read-only SQL endpoint. The warehouse supports full T-SQL (INSERT, UPDATE, DELETE, stored procedures, cross-database queries). Choose lakehouse for data engineering/ML, warehouse for SQL-heavy analytics.

Q: What is the SQL endpoint of a lakehouse?

The SQL endpoint is an auto-generated, read-only T-SQL interface to the lakehouse’s Delta tables. You can connect with SSMS, Azure Data Studio, or any SQL client and run SELECT queries. It doesn’t support INSERT/UPDATE/DELETE — writes must go through Spark.

Q: When would you choose a warehouse over a lakehouse?

When your team primarily uses T-SQL, you need INSERT/UPDATE/DELETE/MERGE, stored procedures, cross-database queries, or you’re building a traditional star schema for BI. The warehouse gives SQL developers a familiar experience without learning Spark.

Q: What is the medallion architecture and how would you implement it in Fabric?

A 3-layer pattern: Bronze (raw data as-is), Silver (cleaned, validated, conformed), Gold (business aggregates for reporting). In Fabric, create separate lakehouses for each layer. Use pipelines + Spark notebooks to move data: Pipeline ingests to Bronze, Notebook transforms Bronze → Silver, Notebook aggregates Silver → Gold. Power BI connects to Gold via Direct Lake.

Q: What is V-Order optimization in Fabric?

V-Order is a write-time optimization that sorts and compresses Parquet row groups for faster reads across all Fabric engines. It’s enabled by default — no manual configuration needed. It improves read performance for both the SQL engine and Power BI Direct Lake without affecting write performance significantly.
Learn more: Lakehouses & Warehouses deep dive covers Delta tables, Direct Lake, and shortcuts.

Direct Lake & Power BI

Q: What is Direct Lake mode in Power BI?

Direct Lake lets Power BI read Delta Parquet files directly from OneLake into the VertiPaq engine’s memory, without importing data or sending live queries. It combines import-speed performance with DirectQuery-level freshness.

Q: How does Direct Lake differ from Import and DirectQuery?

Import: copies data into the PBI model (fast, but stale until refresh). DirectQuery: sends SQL to the source for every interaction (fresh, but slow). Direct Lake: reads Parquet files directly from OneLake into memory on demand (fast AND fresh, no data copy).

Q: What happens when Direct Lake falls back to DirectQuery?

If Direct Lake can’t load the data (table too large for the SKU’s row limit, unsupported data types, or unsupported DAX features), it automatically falls back to DirectQuery. This is slower. Monitor fallbacks in Performance Analyzer and fix by optimizing tables, reducing cardinality, or upgrading the capacity SKU.

Q: What is a semantic model in Fabric?

A semantic model (formerly “dataset”) is the Power BI data model that sits between raw data and reports. It contains table relationships, DAX measures, hierarchies, and display formatting. Every lakehouse and warehouse auto-creates a default semantic model that Power BI can use immediately.

Q: What are the row count limits for Direct Lake, and how do they vary by SKU?

Row limits per table depend on the capacity SKU. F64: ~300M rows per table. F128: ~600M. F256: ~1.2B. F512+: ~3B+. If a table exceeds the limit, Direct Lake falls back to DirectQuery for queries against that table. Solution: aggregate, partition, or upgrade SKU.

Data Pipelines & Dataflows

Q: What is the difference between Fabric pipelines and Dataflows Gen2?

Pipelines are orchestration engines (based on ADF) for chaining activities: copy data, run notebooks, branch logic, loop. Dataflows Gen2 are Power Query Online mashups for no-code/low-code ETL that write directly to lakehouses or warehouses. Pipelines can call Dataflows as an activity.

Q: How would you implement an incremental load in Fabric?

Watermark pattern: Store last-loaded timestamp in a control table. Lookup Activity reads the watermark. Copy Activity filters source rows WHERE modified_date > watermark. After success, update the watermark. For large tables, add partitioning on the watermark column for better Spark/SQL performance.

Q: What is the difference between Dataflow Gen1 and Gen2?

Gen1 outputs to Power BI datasets only. Gen2 outputs to Fabric lakehouses, warehouses, and KQL databases. Gen2 also has enhanced compute (faster for large data), OneLake staging, and can be used as a pipeline activity.

Q: How do Fabric pipelines compare to Apache Airflow?

Fabric pipelines: drag-and-drop UI, shared capacity billing, native Spark/PBI integration, Azure-only. Airflow: Python code (DAGs), open-source, cloud-agnostic, self-managed infrastructure. Fabric is better for Microsoft-centric teams. Airflow is better for multi-cloud or teams wanting full code control over orchestration.

Q: How would you handle errors in a Fabric pipeline?

Each activity has success/failure/completion paths. On failure: (1) Retry the activity (configurable retry count + interval). (2) Use If Condition to check error output and decide next steps. (3) Add a Web Activity on the failure path to send alerts (Teams webhook, email, PagerDuty). (4) Log errors to an audit table for monitoring. (5) Set the pipeline to succeed/fail based on whether errors are critical.
Learn more: Data Pipelines & Dataflows covers Copy Activity, orchestration patterns, and scheduling.

Governance & Security

Q: How does security work in Fabric? What are workspace roles?

Fabric uses workspace-level RBAC with four roles: Admin (full control), Member (create/edit/delete + share), Contributor (create/edit only), Viewer (read-only). For item-level security, you can grant per-item permissions. Row-Level Security (RLS) is supported in semantic models for restricting data by user.

Q: How does Fabric integrate with Microsoft Purview?

Purview provides data governance for Fabric: data cataloging (discover and search Fabric items), data lineage (trace data from source through transformations to reports), sensitivity labels (classify and protect sensitive data), and access policies. Purview scans OneLake metadata automatically.

Q: What is Row-Level Security (RLS) in Fabric and how do you implement it?

RLS restricts which rows a user can see in a report. Implemented in the semantic model using DAX filter expressions (e.g., [Region] = USERPRINCIPALNAME()). Create roles in Power BI Desktop, define filter rules, assign users/groups to roles in the Power BI service. Works with both Import and Direct Lake modes.

Q: How do deployment pipelines work in Fabric?

Deployment pipelines promote Fabric items (lakehouses, warehouses, reports, pipelines) between Dev → Test → Prod workspaces. You compare stages, select items to deploy, and Fabric handles the promotion. Supports deployment rules to swap connection strings or parameters between environments.

Scenario Questions

Q: Your company has 50TB of data in Azure Data Lake Gen2 and wants to migrate to Fabric. How would you approach this?

Phase 1: Create OneLake shortcuts to the existing ADLS Gen2 data (no data movement). Phase 2: Convert frequently queried data to Delta format using Spark notebooks. Phase 3: Gradually migrate pipelines from ADF to Fabric pipelines. Phase 4: Build Power BI reports using Direct Lake on the lakehouse. Shortcuts allow a gradual migration without a big-bang cutover.

Q: Design a real-time analytics solution in Fabric for an e-commerce company tracking orders.

Architecture: (1) Orders stream from the app via Event Hub. (2) Fabric Eventstream ingests events into a KQL database for real-time queries. (3) A scheduled pipeline batch-loads orders into a bronze lakehouse daily. (4) Spark notebooks transform bronze → silver → gold. (5) Power BI uses Direct Lake on gold for historical dashboards + KQL visual for live order tracking. (6) Data Activator monitors for anomalies (order spike, payment failures) and triggers Teams alerts.

Q: A Power BI report on Fabric is slow. Walk through your troubleshooting steps.

(1) Check if Direct Lake is falling back to DirectQuery (Performance Analyzer). (2) Check table sizes vs SKU row limits. (3) Check DAX measure complexity (use DAX Studio). (4) Check if tables need optimization (OPTIMIZE + ZORDER). (5) Verify V-Order is enabled. (6) Check capacity utilization (are other workloads consuming too many CUs?). (7) Consider aggregating Gold layer tables to reduce cardinality. (8) Upgrade capacity SKU if at limits.

Q: How would you set up a multi-team Fabric environment with proper governance?

(1) Create separate workspaces per team (Marketing Analytics, Sales Analytics, etc.). (2) Assign workspace roles (Admins for leads, Contributors for developers, Viewers for business users). (3) Create shared lakehouses for cross-team data, accessed via shortcuts. (4) Use deployment pipelines (Dev → Test → Prod) per team. (5) Enable Purview for data cataloging and lineage. (6) Apply sensitivity labels to PII data. (7) Implement RLS in semantic models for row-level access control.

Q: Your Fabric capacity is being throttled. How do you diagnose and resolve it?

(1) Open the Fabric Capacity Metrics app to see CU consumption by workload. (2) Identify the top consumers (a runaway Spark notebook? Heavy pipeline? Complex PBI report?). (3) Optimize the offending workload (smaller clusters, better queries, aggregation). (4) Schedule heavy jobs during off-peak hours. (5) Use smoothing/throttling policies in capacity settings. (6) If consistently at capacity, upgrade the SKU or split workloads across multiple capacities.

Q: Compare Microsoft Fabric vs Databricks for a company currently on Azure. When would you recommend each?

Fabric: Best when the team is Microsoft-centric (Power BI, M365, Azure AD), wants a single SaaS with unified billing, needs tight Power BI integration (Direct Lake), and prefers managed infrastructure. Databricks: Best when the team needs advanced ML (MLflow, AutoML, Feature Store), multi-cloud (AWS/GCP/Azure), more control over Spark clusters, or uses open-source heavily. Many enterprises use both — Databricks for heavy data engineering/ML, Fabric for BI and self-service analytics.