Databases & ORMs

TL;DR

SQL databases (PostgreSQL, MySQL) store structured data in tables with strict schemas and ACID guarantees. NoSQL databases (MongoDB, Redis, DynamoDB) trade structure for flexibility and scale. ORMs let you talk to databases using your programming language instead of raw SQL. Choose based on your data shape, consistency needs, and query patterns.

The Big Picture

Every backend application needs to store and retrieve data. The database landscape has two major camps — SQL and NoSQL — with ORMs sitting on top as a translation layer between your code and the database engine.

Database landscape: SQL vs NoSQL types, ORM layer, and connection pooling

Explain Like I'm 12

A database is like a giant filing cabinet for your app. SQL databases are like a perfectly organized cabinet — every file has a label, goes in the right drawer, and follows strict rules. NoSQL databases are like a big box — throw anything in, it's flexible, but finding stuff can be harder. An ORM is like a translator who speaks both your language and the cabinet's language, so you never have to learn the cabinet's weird filing system yourself.

SQL vs NoSQL

This is the first decision you'll make for every project: do you need strict structure or flexible schemas? Here's the honest breakdown.

Feature	SQL (Relational)	NoSQL
Data Model	Tables with rows and columns	Documents, key-value, columns, or graphs
Schema	Fixed schema, enforced by the DB	Flexible / schema-less
Scaling	Vertical (bigger server)	Horizontal (more servers)
Consistency	Strong (ACID)	Eventual (BASE), varies by DB
Relationships	JOINs across tables	Denormalized / embedded
Best For	Structured data, complex queries, transactions	Flexible data, high throughput, rapid iteration

Database Types at a Glance

Category	Examples	Data Model	Best For
Relational (SQL)	PostgreSQL, MySQL, SQLite	Tables, rows, columns	Most apps, transactions, reporting
Document	MongoDB, CouchDB	JSON-like documents	CMS, catalogs, user profiles
Key-Value	Redis, DynamoDB	Simple key → value pairs	Caching, sessions, leaderboards
Column-Family	Cassandra, ScyllaDB	Wide columns, sparse rows	Time series, IoT, massive writes
Graph	Neo4j, Amazon Neptune	Nodes and edges	Social networks, recommendations

Tip: Start with PostgreSQL. It's the Swiss Army knife — relational, JSON support, full-text search, and free. You can model 90% of applications with Postgres alone. Only reach for a specialized database when you have a specific problem Postgres can't solve efficiently.

CRUD Operations

CRUD stands for Create, Read, Update, Delete — the four fundamental operations every database supports. Let's compare how they look in SQL versus MongoDB so you can see the mental model shift between relational and document databases.

Create

-- SQL: Insert a row into a table
INSERT INTO users (name, email, age)
VALUES ('Alice', '[email protected]', 30);

// MongoDB: Insert a document into a collection
db.users.insertOne({
  name: "Alice",
  email: "[email protected]",
  age: 30
});

Read

-- SQL: Query with a filter
SELECT name, email FROM users
WHERE age > 25
ORDER BY name;

// MongoDB: Find with a filter
db.users.find(
  { age: { $gt: 25 } },
  { name: 1, email: 1 }
).sort({ name: 1 });

Update

-- SQL: Update matching rows
UPDATE users
SET email = '[email protected]'
WHERE name = 'Alice';

// MongoDB: Update matching documents
db.users.updateOne(
  { name: "Alice" },
  { $set: { email: "[email protected]" } }
);

Delete

-- SQL: Delete matching rows
DELETE FROM users
WHERE age < 18;

// MongoDB: Delete matching documents
db.users.deleteMany({ age: { $lt: 18 } });

Key difference: SQL thinks in rows and tables with a fixed schema. MongoDB thinks in documents (JSON objects) with flexible structure. SQL requires you to define the shape of your data upfront; MongoDB lets you store whatever you want. Both approaches have trade-offs — flexibility vs. data integrity.

ORMs: The Database Translator

An ORM (Object-Relational Mapper) lets you interact with your database using your programming language's objects and methods instead of writing raw SQL strings. You define a model class, and the ORM generates the SQL behind the scenes.

Pros and Cons

Pros	Cons
Write Python/JS instead of SQL	Hides what SQL is actually running
Database-agnostic (switch DBs easily)	Complex queries are harder than raw SQL
Built-in protection against SQL injection	Performance overhead for large datasets
Auto-generates migrations	Learning curve for the ORM itself
Type safety and IDE autocompletion	Can encourage N+1 queries (see below)

Popular ORMs

ORM	Language	Style
SQLAlchemy	Python	Full-featured, supports raw SQL + ORM patterns
Django ORM	Python	Batteries-included, tightly integrated with Django
Prisma	Node.js / TypeScript	Type-safe, modern, auto-generated client
Sequelize	Node.js	Mature, promise-based, supports multiple DBs

SQLAlchemy Example

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, Session

Base = declarative_base()

# Define a model (maps to a database table)
class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String(100), nullable=False)
    email = Column(String(200), unique=True)

# Connect and create tables
engine = create_engine("postgresql://user:pass@localhost/mydb")
Base.metadata.create_all(engine)

# Create a user (INSERT)
with Session(engine) as session:
    user = User(name="Alice", email="[email protected]")
    session.add(user)
    session.commit()

# Query users (SELECT)
with Session(engine) as session:
    users = session.query(User).filter(User.name == "Alice").all()
    for u in users:
        print(u.name, u.email)

Reality check: ORMs are great for CRUD, but complex queries often need raw SQL. Most production codebases use a mix — ORM for standard operations, raw SQL for complex reports and analytics. Don't fight the ORM when SQL is clearer.

The N+1 Problem

The N+1 problem is the single most common performance trap when using ORMs. It happens when your code fetches a list of records, then makes a separate database query for each record's related data. One query to get the list (1), plus one query per item (N) = N+1 queries.

The Problem: N+1 Queries

Imagine you want to display all users and their posts:

# BAD: N+1 queries
# Query 1: Get all users
users = session.query(User).all()

# Queries 2 to N+1: Get posts for EACH user
for user in users:
    # This fires a SEPARATE query for each user!
    print(f"{user.name}: {len(user.posts)} posts")

If you have 100 users, this fires 101 database queries. With 10,000 users, it fires 10,001. Your page load time goes from milliseconds to seconds.

The Fix: Eager Loading

# GOOD: 1 query with eager loading (JOIN)
from sqlalchemy.orm import joinedload

users = session.query(User).options(
    joinedload(User.posts)
).all()

# Now posts are already loaded — no extra queries
for user in users:
    print(f"{user.name}: {len(user.posts)} posts")

The Fix in Raw SQL

-- Instead of querying users, then posts separately:
-- Query 1: SELECT * FROM users;
-- Query 2-N: SELECT * FROM posts WHERE user_id = ?;

-- Use a single JOIN:
SELECT users.name, COUNT(posts.id) AS post_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
GROUP BY users.name;

Warning: The N+1 problem is the #1 ORM performance killer. Always check your query count. Most ORMs have logging options to show the actual SQL being executed — enable them during development. If you see the same query repeated dozens of times, you've got an N+1.

Indexing Strategies

An index is a data structure that makes queries faster — like the index at the back of a textbook. Without it, the database scans every single row (a "full table scan"). With it, the database jumps straight to the matching rows.

Index Types

Index Type	How It Works	Best For
B-tree (default)	Balanced tree, sorted data	Equality, range queries, ORDER BY, LIKE 'prefix%'
Hash	Hash table lookup	Exact equality only (=), very fast for lookups
GIN	Generalized Inverted Index	Full-text search, JSONB, arrays (PostgreSQL)
Composite	Multi-column B-tree	Queries that filter/sort on multiple columns

Creating Indexes

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Partial index (only index active users)
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';

-- Check if your query uses the index
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

Performance Impact

Scenario	Without Index	With Index
Find user by email (1M rows)	~500ms (full table scan)	~1ms (index lookup)
Find orders by user + date range	~2s (scanning all orders)	~5ms (composite index)
INSERT a new row	~1ms	~2ms (must update index too)

Tip: Index columns you filter, sort, and join on. Don't index everything — each index slows down writes because the database must update the index on every INSERT, UPDATE, and DELETE. A table with 10 indexes writes slower than one with 2. Profile first, index second.

When NOT to Index

Low-cardinality columns — A boolean is_active column with only 2 values won't benefit much from an index.
Tiny tables — If the table has fewer than 1,000 rows, a full scan is already fast.
Write-heavy tables — If you're inserting millions of rows per second, every index slows writes.
Columns you never query on — Indexes only help the columns you actually filter or sort by.

Connection Pooling

Opening a database connection is expensive. It involves DNS resolution, TCP handshake, TLS negotiation, and authentication — all before you send a single query. A typical connection takes 50–200ms to establish. If every API request opens a new connection, your app crumbles under load.

A connection pool solves this by maintaining a set of pre-opened connections. When your code needs to talk to the database, it borrows a connection from the pool, uses it, and returns it. No setup cost, no teardown waste.

How It Works

Pool size — The number of connections kept open (e.g., 10–20).
Borrow — Your code requests a connection; the pool hands one over instantly.
Return — After the query, the connection goes back to the pool (not closed).
Overflow — If all connections are in use, requests wait in a queue or create temporary overflow connections.

Pool Configuration Example

# SQLAlchemy connection pool
from sqlalchemy import create_engine

engine = create_engine(
    "postgresql://user:pass@localhost/mydb",
    pool_size=10,        # Keep 10 connections open
    max_overflow=20,     # Allow 20 extra under heavy load
    pool_timeout=30,     # Wait max 30s for a connection
    pool_recycle=1800,   # Recycle connections every 30 min
    pool_pre_ping=True,  # Test connections before use
)

Pool sizing formula: A good starting point is pool_size = (2 * CPU cores) + number_of_disks. For a 4-core server with SSDs, that's about 10 connections. More isn't always better — too many connections actually hurt performance because the database spends time context-switching between them instead of executing queries.

Migrations

A migration is a versioned script that changes your database schema — adding a table, renaming a column, creating an index. Migrations let you evolve your database the same way you evolve your code: incrementally, reversibly, and tracked in version control.

Why Migrations Matter

Reproducibility — Any developer can run migrations to get an identical database schema.
Version control — Schema changes are tracked in git alongside code changes.
Rollbacks — Each migration has an "up" (apply) and "down" (revert) operation.
Team collaboration — Multiple developers can evolve the schema without conflicts.

Migration Tools

Tool	Language	Works With
Alembic	Python	SQLAlchemy
Prisma Migrate	Node.js / TypeScript	Prisma ORM
Flyway	Java / Any	Any SQL database
Django Migrations	Python	Django ORM

Migration Example (Alembic)

"""Add email column to users table

Revision ID: a1b2c3d4
Create Date: 2026-04-01
"""
from alembic import op
import sqlalchemy as sa

def upgrade():
    # Apply the change
    op.add_column('users', sa.Column('email', sa.String(200)))
    op.create_unique_constraint('uq_users_email', 'users', ['email'])

def downgrade():
    # Revert the change
    op.drop_constraint('uq_users_email', 'users')
    op.drop_column('users', 'email')

# Generate a migration from model changes
alembic revision --autogenerate -m "Add email column"

# Apply all pending migrations
alembic upgrade head

# Rollback the last migration
alembic downgrade -1

Warning: Never modify a production database without a migration. No "just run this ALTER TABLE real quick." Every schema change goes through a migration file, gets code reviewed, tested in staging, and then deployed. Manual changes are invisible to your team and impossible to roll back.

Transactions & ACID

A transaction groups multiple database operations into a single unit. Either all operations succeed, or none of them do. This is critical for data integrity — you never want money to vanish mid-transfer.

ACID Properties

Property	Meaning	Real-World Analogy
Atomicity	All or nothing — if any step fails, everything rolls back	A bank transfer either moves money or doesn't — no in-between state
Consistency	The database always moves from one valid state to another	Account balances can never go negative if that's a rule
Isolation	Concurrent transactions don't interfere with each other	Two people transferring money at the same time don't see partial results
Durability	Once committed, the data survives crashes and power failures	After the bank confirms the transfer, it's permanent — even if the power goes out

Transaction Example: Money Transfer

-- Transfer $100 from Alice to Bob
BEGIN TRANSACTION;

-- Debit Alice
UPDATE accounts SET balance = balance - 100
WHERE name = 'Alice' AND balance >= 100;

-- Credit Bob
UPDATE accounts SET balance = balance + 100
WHERE name = 'Bob';

-- If both succeeded, make it permanent
COMMIT;

-- If anything failed, undo everything
-- ROLLBACK;

# Same transaction in SQLAlchemy
from sqlalchemy.orm import Session

with Session(engine) as session:
    try:
        alice = session.query(Account).filter_by(name="Alice").one()
        bob = session.query(Account).filter_by(name="Bob").one()

        if alice.balance < 100:
            raise ValueError("Insufficient funds")

        alice.balance -= 100
        bob.balance += 100

        session.commit()  # All or nothing
    except Exception:
        session.rollback()  # Undo everything
        raise

When to use transactions: Any time multiple related operations must succeed or fail together. Bank transfers, order placement (create order + decrement inventory + charge payment), and user registration (create user + create profile + send welcome email trigger) are classic examples.

Choosing the Right Database

Don't overthink this. Most applications start with PostgreSQL and add specialized databases as needed. Here's a practical decision guide:

Structured data with relationships? → PostgreSQL. Users, orders, products, anything with foreign keys.
Document-like, flexible data? → MongoDB. CMS content, product catalogs with varying attributes, user-generated data.
Caching or sessions? → Redis. In-memory, blazing fast, perfect for short-lived data.
Complex relationships matter most? → PostgreSQL (or Neo4j for extreme cases like social graphs).
Massive scale, simple queries? → DynamoDB or Cassandra. When you need to handle millions of writes per second with simple key-value lookups.
Full-text search? → PostgreSQL (built-in) or Elasticsearch (dedicated search engine).
Time-series data? → TimescaleDB (PostgreSQL extension) or InfluxDB.

The 80/20 rule: PostgreSQL handles 80% of use cases. Start there. Add Redis for caching when performance matters. Add a specialized database only when you hit a clear limitation. Don't pick MongoDB because it's "cool" — pick it because your data genuinely doesn't fit a relational model.

Test Yourself

Q1: When would you choose MongoDB over PostgreSQL?

Choose MongoDB when your data is document-shaped (nested, varying structure) and you don't need complex JOINs or strict transactions. Good examples: a CMS where each article has different metadata fields, a product catalog where electronics have different attributes than clothing, or a prototype where the schema is evolving rapidly. If your data has clear relationships (users → orders → items) and you need ACID transactions, stick with PostgreSQL.

Q2: What is the N+1 problem and how do you fix it?

The N+1 problem occurs when you query a list of N records, then make a separate query for each record's related data. For 100 users with posts, that's 1 query for users + 100 queries for posts = 101 queries. The fix is eager loading — use a JOIN or the ORM's eager loading option (e.g., joinedload in SQLAlchemy, include in Prisma, select_related / prefetch_related in Django) to fetch everything in 1–2 queries.

Q3: Explain ACID with a real-world example.

ACID describes the guarantees of a database transaction. Using a bank transfer of $100 from Alice to Bob: Atomicity — both the debit and credit happen, or neither does (no money disappears). Consistency — the total money in the system stays the same; balances can't go negative if that's a constraint. Isolation — if two transfers happen simultaneously, they don't interfere (no double-spending). Durability — once the transfer is confirmed (committed), it's permanent even if the server crashes.

Q4: Why is connection pooling important?

Opening a database connection is expensive (50–200ms for DNS, TCP handshake, TLS, and authentication). Without pooling, every API request opens and closes a connection, wasting time and overwhelming the database with connection overhead. A connection pool keeps a set of pre-opened connections ready. Your code borrows one instantly, uses it, and returns it. This reduces latency, limits the total number of connections to the database, and lets your app handle high traffic without running out of database connections.

Q5: What's the trade-off of adding an index?

Indexes make reads faster but writes slower. Every INSERT, UPDATE, and DELETE must also update the index, which takes extra time and disk space. A table with 10 indexes writes significantly slower than one with 2. The trade-off is worth it for columns you frequently filter, sort, or join on. Don't index columns with low cardinality (like booleans), tiny tables, or columns you never query. Profile your slow queries first, then add targeted indexes.

Interview Questions

Q1: You have a slow query that takes 5 seconds. Walk me through how you'd debug it.

Step 1: EXPLAIN ANALYZE — Run the query with EXPLAIN ANALYZE to see the execution plan. Look for full table scans (Seq Scan) where you expect index usage.

Step 2: Check indexes — Are the columns in WHERE, JOIN, and ORDER BY indexed? If not, create targeted indexes.

Step 3: Check the query itself — Is it selecting SELECT * when it only needs 2 columns? Is there a missing WHERE clause pulling millions of rows? Are there unnecessary JOINs?

Step 4: Check data volume — Is the table huge? Consider pagination, archiving old data, or partitioning.

Step 5: Check N+1 — If this is an ORM query, enable query logging. Are you seeing hundreds of similar queries instead of one JOIN?

Step 6: Check connection and server — Is the database server under heavy load? Are connections maxed out? Check pg_stat_activity for locks or long-running queries.

Q2: Design the database schema for a social media app (users, posts, comments, likes).

Users table: id (PK), username (unique), email (unique), password_hash, bio, avatar_url, created_at.

Posts table: id (PK), user_id (FK → users), content (text), image_url, created_at. Index on user_id + created_at for "user's recent posts" queries.

Comments table: id (PK), post_id (FK → posts), user_id (FK → users), content, created_at. Index on post_id.

Likes table: id (PK), user_id (FK → users), post_id (FK → posts), created_at. Unique constraint on (user_id, post_id) to prevent double-liking. Index on post_id for counting likes.

Follows table: follower_id (FK → users), following_id (FK → users), created_at. Composite PK on both columns. Indexes on both columns for "who follows me" and "who do I follow" queries.

For the feed, use a fanout-on-read approach: query posts from followed users with JOIN follows ON follows.following_id = posts.user_id WHERE follows.follower_id = ?.

Q3: Explain the CAP theorem and how it affects database choice.

The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously:

Consistency — Every read returns the most recent write.
Availability — Every request gets a response (even if it's not the latest data).
Partition tolerance — The system works even if network connections between nodes fail.

Since network partitions are unavoidable in distributed systems, you're really choosing between CP (consistency + partition tolerance) and AP (availability + partition tolerance).

CP databases (PostgreSQL, MongoDB with majority reads): Choose consistency — during a partition, some requests may fail rather than return stale data. Best for banking, inventory, anything where correctness matters.

AP databases (Cassandra, DynamoDB, CouchDB): Choose availability — every request gets a response, but data may be slightly stale. Best for social feeds, analytics, shopping carts where "close enough" is fine.

Q4: What happens if two users try to buy the last item simultaneously?

Without proper handling, both users could buy the same item (a race condition). Here are three solutions:

1. Pessimistic locking (SELECT FOR UPDATE): Lock the row when the first user reads it. The second user waits until the first transaction completes. Prevents the issue but reduces throughput.

2. Optimistic locking (version column): Add a version column. When updating, include WHERE version = expected_version. If another transaction already changed it, the UPDATE affects 0 rows and you know to retry or show "sold out."

3. Atomic UPDATE: Use UPDATE inventory SET quantity = quantity - 1 WHERE product_id = ? AND quantity > 0. The database handles concurrency — if quantity is already 0, the WHERE clause prevents the update. Check the affected row count to know if you got the item.

The atomic UPDATE approach is simplest and works for most e-commerce scenarios. Pessimistic locking is safer for complex multi-step transactions.

Databases & ORMs

The Big Picture

SQL vs NoSQL

Database Types at a Glance

CRUD Operations

Create

Read

Update

Delete

ORMs: The Database Translator

Pros and Cons

Popular ORMs

SQLAlchemy Example

The N+1 Problem

The Problem: N+1 Queries

The Fix: Eager Loading

The Fix in Raw SQL

Indexing Strategies

Index Types

Creating Indexes

Performance Impact

When NOT to Index

Connection Pooling

How It Works

Pool Configuration Example

Migrations

Why Migrations Matter

Migration Tools

Migration Example (Alembic)

Transactions & ACID

ACID Properties

Transaction Example: Money Transfer

Choosing the Right Database

Test Yourself

Interview Questions

Related Topics