Microsoft Fabric Core Concepts

TL;DR

Fabric has 7 building blocks: OneLake (unified storage), Workspaces (project folders), Capacities (compute you pay for), Lakehouse (files + Delta tables), Warehouse (T-SQL analytics), Pipelines & Dataflows (ETL), and Power BI (reports). Everything lives in OneLake and shares one security model.

Fabric building blocks: OneLake at center connected to Lakehouse, Warehouse, Notebooks, Pipelines, and Power BI, all within a Workspace powered by Capacity Units
Explain Like I'm 12

Think of Fabric as a huge shared Google Drive for data. OneLake is the drive itself. Workspaces are folders — each team gets one. Capacities are like how much electricity the building gets (more capacity = more people can work at once). Inside each folder you have a Lakehouse (a messy inbox that auto-organizes), a Warehouse (a perfectly organized filing cabinet), Pipelines (mail carriers that deliver data), and Power BI (a printer that turns data into pretty charts).

Cheat Sheet

ConceptWhat It DoesPlain English
OneLakeUnified ADLS Gen2 storage for all Fabric items“One shared hard drive for all your data”
WorkspaceContainer for items (lakehouses, warehouses, reports)“A project folder with access controls”
CapacityCompute resources measured in CUs (Capacity Units)“The engine that powers everything — bigger engine = faster”
LakehouseSchema-on-read store with files + Delta tables + SQL endpoint“Dump data in, query it with Spark or SQL”
WarehouseFull T-SQL data warehouse with DML support“A traditional SQL warehouse, fully managed”
PipelineADF-like data orchestration (copy, transform, schedule)“Automated data delivery — move data from A to B on a schedule”
Dataflow Gen2Power Query Online for no-code ETL into lakehouses“Excel-like drag-and-drop data cleaning that writes to OneLake”
NotebookPySpark/Spark SQL notebook for data engineering + ML“Jupyter notebook connected to your lakehouse”
Semantic ModelPower BI data model (formerly “dataset”) with DAX measures“The brain behind your Power BI reports”
Direct LakePower BI reads Delta tables in OneLake directly — no import/copy“Reports load from the lake at import speed without importing”

1. OneLake

OneLake is Fabric’s single storage layer. It’s built on Azure Data Lake Storage Gen2 but abstracted away so you never manage storage accounts. Every Fabric tenant gets exactly one OneLake — every workspace, lakehouse, and warehouse writes to it.

Key point: OneLake stores everything as Delta Parquet. This open format means any engine (Spark, T-SQL, Power BI) can read the same data without copying.

Shortcuts

OneLake shortcuts let you reference external data (ADLS Gen2, S3, GCS, Dataverse) without moving it. The shortcut appears as a folder in your lakehouse, but the data stays where it is. This is Fabric’s answer to “I don’t want to copy terabytes.”

-- You can query shortcut data just like local data
SELECT * FROM lakehouse1.shortcut_to_s3_bucket.sales_2024
WHERE region = 'US'

2. Workspaces

A workspace is a container for Fabric items — lakehouses, warehouses, notebooks, pipelines, reports. Think of it as a project folder with role-based access control (Admin, Member, Contributor, Viewer).

RoleCan Do
AdminEverything + manage workspace settings and access
MemberCreate, edit, delete all items + share items
ContributorCreate and edit items, but cannot share or manage access
ViewerView items only (read reports, browse data)
Best practice: Use separate workspaces for Dev, Test, and Prod. Fabric supports deployment pipelines to promote items between them.

3. Capacities

A capacity is the compute engine that powers your Fabric workloads. You buy a capacity SKU (F2, F4, F8 … F2048), and all workspaces assigned to that capacity share its resources. Measured in CUs (Capacity Units).

SKUCUsTypical Use
F22Dev/test, small team exploration
F6464Production workloads, department-level
F256+256+Enterprise, heavy Spark/warehouse workloads
Watch out: All Fabric experiences share the same capacity pool. A runaway Spark notebook can starve your Power BI reports. Use capacity settings to throttle or isolate workloads.

4. Lakehouse

The lakehouse is Fabric’s core data store. It combines unstructured files and structured Delta tables in one place. You get:

  • Files section — drop CSVs, Parquet, JSON, images, anything
  • Tables section — Delta tables with ACID transactions
  • SQL endpoint — auto-generated read-only T-SQL interface to your Delta tables
  • Default semantic model — auto-created Power BI model for Direct Lake reporting
# PySpark in a Fabric notebook — read CSV, write as Delta table
df = spark.read.csv("Files/raw/sales_2024.csv", header=True, inferSchema=True)
df.write.format("delta").mode("overwrite").saveAsTable("sales_2024")

5. Warehouse

The Fabric warehouse is a full T-SQL data warehouse. Unlike the lakehouse SQL endpoint (read-only), the warehouse supports INSERT, UPDATE, DELETE, stored procedures, and cross-database queries.

FeatureLakehouse SQL EndpointWarehouse
Read (SELECT)YesYes
Write (INSERT/UPDATE/DELETE)No (read-only)Yes
Stored proceduresNoYes
Cross-database queriesNoYes
Storage formatDelta Parquet (OneLake)Delta Parquet (OneLake)

6. Pipelines & Dataflows Gen2

Pipelines are Fabric’s orchestration engine (based on Azure Data Factory). Use them to copy data from 100+ sources, run notebooks, trigger dataflows, and schedule everything.

Dataflows Gen2 are Power Query Online mashups that land data directly into lakehouses or warehouses. They’re the no-code/low-code option for data transformation.

When to use which: Use Pipelines for orchestration and data movement at scale. Use Dataflows Gen2 for citizen-developer ETL (business users who know Excel/Power Query but not code). Use Notebooks for complex Spark transformations.

7. Power BI & Direct Lake

Power BI is built into Fabric as a first-class experience. The game-changer is Direct Lake mode — Power BI reads Delta tables directly from OneLake without importing a copy. You get import-level speed with DirectQuery-level freshness.

ModeHow It WorksSpeedData Freshness
ImportCopies data into Power BI modelFastStale until refresh
DirectQueryQueries source on every interactionSlowAlways live
Direct LakeReads Delta Parquet from OneLake directlyFastNear real-time
Auto-generated semantic model: Every lakehouse and warehouse automatically creates a default semantic model. You can build Power BI reports on it immediately — no manual dataset setup required.

8. Medallion Architecture

Fabric encourages the medallion architecture (Bronze → Silver → Gold) for organizing data in lakehouses:

LayerPurposeData Quality
BronzeRaw ingestion — land data as-is from sourcesRaw, unvalidated
SilverCleaned, deduplicated, joined — “single source of truth”Validated, conformed
GoldBusiness-level aggregates — ready for reports and dashboardsCurated, optimized
Implementation: Create separate lakehouses for each layer (e.g., bronze_lakehouse, silver_lakehouse, gold_lakehouse), or use schemas within a single lakehouse. Notebooks or pipelines move data between layers.

Test Yourself

Q: What is OneLake and how is it different from ADLS Gen2?

OneLake is Fabric’s built-in unified storage layer, built on top of ADLS Gen2 but fully managed. You don’t create storage accounts — every tenant gets one OneLake automatically. All Fabric items (lakehouses, warehouses, reports) store data in OneLake as Delta Parquet.

Q: What’s the difference between a lakehouse SQL endpoint and a warehouse?

The lakehouse SQL endpoint is read-only — you can SELECT but not INSERT/UPDATE/DELETE. The warehouse supports full T-SQL DML (INSERT, UPDATE, DELETE), stored procedures, and cross-database queries. Both store data as Delta Parquet in OneLake.

Q: What is Direct Lake mode in Power BI?

Direct Lake lets Power BI read Delta tables directly from OneLake without importing data. It combines the speed of Import mode with the freshness of DirectQuery. No data copy needed — the report always reflects the latest Delta table state.

Q: When would you use a pipeline vs a Dataflow Gen2 vs a notebook?

Pipeline: orchestration, scheduling, and data movement at scale (copy activity). Dataflow Gen2: no-code ETL for business users using Power Query Online. Notebook: complex PySpark transformations, data science, or anything requiring code-level control.

Q: What are the three layers of the medallion architecture?

Bronze: raw data as-is from sources. Silver: cleaned, deduplicated, validated data. Gold: business-level aggregates and curated datasets ready for dashboards and reports.