Core Concepts of SQL

TL;DR

SQL has 6 key building blocks: SELECT (pick columns), WHERE (filter rows), JOIN (combine tables), GROUP BY (aggregate), Subqueries (nest queries), and Indexes (make it fast). Master these and you can query any database.

The Big Picture

Every SQL query is built from these pieces. They snap together like building blocks — start with SELECT, layer on the rest as needed.

SQL execution order: FROM/JOIN, WHERE, GROUP BY, HAVING, SELECT, ORDER BY/LIMIT
Explain Like I'm 12
Think of a database as a big spreadsheet. SELECT is choosing which columns to show. WHERE is highlighting only the rows you care about. JOIN is taping two spreadsheets side by side when they share a common column. GROUP BY is sorting your candy by color and counting each pile. Subqueries are like asking a question to answer another question: "Who scored higher than the average?" (first you need to find the average). Indexes are like the index at the back of a textbook — instead of reading every page, you jump straight to the right one.

Cheat Sheet

Concept Syntax Plain English
SELECT SELECT col1, col2 FROM table Give me these columns from this table
WHERE WHERE condition But only rows that match this condition
JOIN JOIN table2 ON t1.id = t2.id Combine with another table where IDs match
GROUP BY GROUP BY col HAVING condition Group rows together and summarize each group
Subquery WHERE col IN (SELECT ...) Use the result of one query inside another
Index CREATE INDEX idx ON table(col) Build a shortcut so the database finds rows faster

The 6 Building Blocks

SELECT — Pick Your Columns

Every SQL query starts with SELECT. It tells the database which columns you want back. Use * to grab everything, or list specific columns to keep things focused.

-- Get all columns
SELECT * FROM employees;

-- Get specific columns
SELECT name, email, department FROM employees;
Tip: Avoid SELECT * in production code. Listing columns is faster and makes your intent clear.

You can also rename columns on the fly with AS:

SELECT name AS employee_name, salary * 12 AS annual_salary
FROM employees;

WHERE — Filter Your Rows

WHERE narrows your results. Only rows that pass the condition get returned.

-- Simple comparison
SELECT name, salary FROM employees WHERE salary > 50000;

-- Multiple conditions
SELECT name FROM employees
WHERE department = 'Engineering' AND hire_date > '2024-01-01';

-- Pattern matching
SELECT name FROM employees WHERE name LIKE 'J%';

-- List of values
SELECT name FROM employees WHERE department IN ('Sales', 'Marketing');
Operators: =, !=, <, >, <=, >=, LIKE, IN, BETWEEN, IS NULL

JOIN — Combine Tables

Real data lives across multiple tables. JOIN lets you combine them by matching a shared column. This is one of SQL's most powerful features.

SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.id;

There are several types of JOINs (INNER, LEFT, RIGHT, FULL) — each handles unmatched rows differently.

Deep Dive: SQL JOINs →

GROUP BY — Aggregate Data

GROUP BY collapses rows into groups and lets you run aggregate functions (COUNT, SUM, AVG, MIN, MAX) on each group.

-- How many employees per department?
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;

-- Average salary per department, only show departments with 5+ people
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;
Common mistake: WHERE filters rows before grouping. HAVING filters groups after grouping. Don't mix them up.

Subqueries — Query Inside a Query

A subquery is a SELECT inside another statement. Use it when you need to answer one question to ask another.

-- Who earns more than average?
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Find departments with no employees
SELECT department_name
FROM departments
WHERE id NOT IN (SELECT DISTINCT department_id FROM employees);
Tip: If a subquery returns one value, use = or >. If it returns many values, use IN or EXISTS.

Indexes — Make It Fast

Without an index, the database reads every single row to find your data (a "full table scan"). An index is like a book's index — it lets the database jump straight to the right rows.

-- Create an index on the email column
CREATE INDEX idx_employees_email ON employees(email);

-- Now this query is fast even with millions of rows
SELECT * FROM employees WHERE email = '[email protected]';
Trade-off: Indexes speed up reads but slow down writes (INSERT/UPDATE/DELETE) because the index must be updated too. Index columns you search on frequently, not every column.

How SQL Executes (The Secret Order)

You write SQL in one order, but the database executes it in a different order. Understanding this clears up most SQL confusion:

1
FROM / JOIN
Pick the tables and combine them
2
WHERE
Filter individual rows
3
GROUP BY
Collapse into groups
4
HAVING
Filter groups
5
SELECT
Pick columns and compute expressions
6
ORDER BY / LIMIT
Sort and cap results

This is why you can't use a column alias from SELECT in your WHERE clause — WHERE runs before SELECT.

Test Yourself

Q: What does SELECT do?

Picks which columns to return from the query result.

Q: What's the difference between WHERE and HAVING?

WHERE filters individual rows before grouping. HAVING filters groups after GROUP BY.

Q: When would you use a subquery?

When you need the result of one query to answer another. Example: finding all employees who earn more than the average salary.

Q: Why shouldn't you index every column?

Indexes speed up reads but slow down writes (INSERT/UPDATE/DELETE). Only index columns you search on frequently.

Q: In what order does the database actually execute a SQL query?

FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY/LIMIT. (Not the order you write it!)

Interview Questions

Q: What is the difference between WHERE and HAVING?

WHERE filters rows before grouping (operates on individual rows). HAVING filters groups after GROUP BY (operates on aggregated results). You can't use aggregate functions in WHERE — use HAVING instead. Example: HAVING COUNT(*) > 5 works, but WHERE COUNT(*) > 5 is a syntax error.

Q: What is the order of execution of a SQL query?

FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT. This is different from the order you write it. This is why you can't use a SELECT alias in WHERE (WHERE runs before SELECT).

Q: When would you use a subquery instead of a JOIN?

Use a subquery when: (1) you need a single aggregated value for comparison (e.g., salary > avg), (2) you need to check existence (EXISTS), (3) the inner query is independent and reusable. Use a JOIN when: (1) you need columns from both tables in the output, (2) performance matters (JOINs are often faster), (3) you're combining many tables. In many cases, both work — use whichever is more readable.

Q: Explain the difference between clustered and non-clustered indexes.

A clustered index determines the physical order of data on disk — a table can have only one (usually the primary key). A non-clustered index is a separate structure with pointers back to the data rows — a table can have many. Think of a clustered index as the order of pages in a dictionary, and a non-clustered index as the index at the back of a textbook.

Q: Write a query to find the 3rd highest salary without using LIMIT or window functions.

SELECT DISTINCT salary FROM employees e1
WHERE 3 = (
  SELECT COUNT(DISTINCT salary) FROM employees e2
  WHERE e2.salary >= e1.salary
);

This correlated subquery counts how many distinct salaries are ≥ each salary. The one with exactly 3 is the 3rd highest.