Data Visualization with Python
Create impactful charts with Matplotlib and Seaborn: bar, line, scatter, histogram, heatmap, box plots. Learn which chart type to use for which data, and how to make publication-quality visuals.
Explain Like I'm 12
Numbers in a spreadsheet are boring. Charts turn those numbers into pictures your brain can understand instantly. A bar chart says "Team A scored more than Team B" without reading a single number. A line chart shows if something is going up or down over time.
Matplotlib is the basic drawing tool — you can make any chart but you have to tell it exactly how. Seaborn is like Matplotlib with built-in templates — it makes statistical charts look great with way less code.
Matplotlib Basics
Matplotlib is the foundation of Python visualization. Every other library (Seaborn, Pandas plots) is built on top of it. Understand Figure and Axes, and everything else clicks.
The Figure/Axes model
import matplotlib.pyplot as plt
# Method 1: Quick and simple
plt.plot([1, 2, 3, 4], [10, 20, 25, 30])
plt.title("Simple Line")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
# Method 2: Object-oriented (recommended for anything beyond a quick sketch)
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot([1, 2, 3, 4], [10, 20, 25, 30])
ax.set_title("Simple Line")
ax.set_xlabel("X")
ax.set_ylabel("Y")
plt.show()
fig, ax = plt.subplots()) for real work. The plt.plot() shortcut works for quick experiments, but the OO API gives you full control and works properly with subplots.
Chart Types with Code
Line chart — trends over time
fig, ax = plt.subplots(figsize=(8, 5))
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
revenue = [12000, 15000, 13500, 18000, 21000, 19500]
ax.plot(months, revenue, marker="o", linewidth=2, color="#6366f1")
ax.set_title("Monthly Revenue 2025")
ax.set_ylabel("Revenue ($)")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Bar chart — comparing categories
fig, ax = plt.subplots(figsize=(8, 5))
departments = ["Eng", "Sales", "Marketing", "Product", "Support"]
headcount = [45, 30, 20, 15, 25]
bars = ax.bar(departments, headcount, color="#6366f1", edgecolor="white")
ax.set_title("Headcount by Department")
ax.set_ylabel("Employees")
# Add value labels on top of each bar
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, height + 0.5,
str(int(height)), ha="center", va="bottom", fontweight="bold")
plt.tight_layout()
plt.show()
ax.barh()) when category names are long.
Scatter plot — correlations
import numpy as np
fig, ax = plt.subplots(figsize=(8, 5))
# Simulated data
np.random.seed(42)
hours_studied = np.random.uniform(1, 10, 50)
test_scores = hours_studied * 8 + np.random.normal(0, 5, 50)
ax.scatter(hours_studied, test_scores, alpha=0.7, color="#6366f1", edgecolors="white")
ax.set_title("Study Hours vs Test Score")
ax.set_xlabel("Hours Studied")
ax.set_ylabel("Test Score")
plt.tight_layout()
plt.show()
np.polyfit() to show correlation direction.
Histogram — distributions
fig, ax = plt.subplots(figsize=(8, 5))
salaries = np.random.normal(75000, 15000, 1000) # simulated
ax.hist(salaries, bins=30, color="#6366f1", edgecolor="white", alpha=0.8)
ax.set_title("Salary Distribution")
ax.set_xlabel("Salary ($)")
ax.set_ylabel("Count")
ax.axvline(np.mean(salaries), color="#ef4444", linestyle="--", label=f"Mean: ${np.mean(salaries):,.0f}")
ax.legend()
plt.tight_layout()
plt.show()
bins matters — too few hides patterns, too many creates noise.
Pie chart — proportions (use with caution)
fig, ax = plt.subplots(figsize=(6, 6))
labels = ["Product A", "Product B", "Product C", "Product D"]
sizes = [40, 30, 20, 10]
colors = ["#6366f1", "#10b981", "#f59e0b", "#ef4444"]
ax.pie(sizes, labels=labels, colors=colors, autopct="%1.0f%%",
startangle=90, wedgeprops={"edgecolor": "white", "linewidth": 2})
ax.set_title("Revenue by Product")
plt.tight_layout()
plt.show()
Seaborn — Statistical Visualization
Seaborn wraps Matplotlib with better defaults and built-in statistical calculations. It works directly with Pandas DataFrames.
Distribution: histplot
import seaborn as sns
import pandas as pd
# Seaborn works best with DataFrames
df = pd.DataFrame({"salary": np.random.normal(75000, 15000, 500),
"dept": np.random.choice(["Eng", "Sales", "Marketing"], 500)})
fig, ax = plt.subplots(figsize=(8, 5))
sns.histplot(data=df, x="salary", hue="dept", bins=25, alpha=0.6, ax=ax)
ax.set_title("Salary Distribution by Department")
plt.tight_layout()
plt.show()
Comparison: boxplot
fig, ax = plt.subplots(figsize=(8, 5))
sns.boxplot(data=df, x="dept", y="salary", palette="Set2", ax=ax)
ax.set_title("Salary by Department")
plt.tight_layout()
plt.show()
Correlation: heatmap
# Create a correlation matrix from your DataFrame
corr = df[["salary", "years_exp", "satisfaction", "projects"]].corr()
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", center=0,
square=True, linewidths=1, ax=ax)
ax.set_title("Feature Correlation Matrix")
plt.tight_layout()
plt.show()
center=0 so the color scale is symmetric.
Explore everything: pairplot
# Scatter plots for every pair of numeric columns
# Diagonal shows the distribution of each variable
sns.pairplot(df, hue="dept", palette="Set2", height=2.5)
plt.suptitle("Pairwise Relationships", y=1.02)
plt.show()
Categorical: catplot
# Combines box, violin, bar, swarm, etc. in one function
sns.catplot(data=df, x="dept", y="salary", hue="level",
kind="violin", split=True, height=5, aspect=1.5)
plt.title("Salary Distribution by Department and Level")
plt.show()
Customization
The difference between a quick plot and a presentation-ready visual is customization. Here are the essentials.
Colors, labels, and legends
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(months, revenue, color="#6366f1", linewidth=2.5, label="Revenue")
ax.plot(months, costs, color="#ef4444", linewidth=2.5, linestyle="--", label="Costs")
ax.set_title("Revenue vs Costs", fontsize=16, fontweight="bold")
ax.set_xlabel("Month", fontsize=12)
ax.set_ylabel("Amount ($)", fontsize=12)
ax.legend(loc="upper left", frameon=True, fontsize=11)
ax.grid(True, alpha=0.3)
# Remove top and right spines for a cleaner look
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()
Seaborn themes
# Set a global theme (affects all subsequent plots)
sns.set_theme(style="whitegrid") # options: white, dark, whitegrid, darkgrid, ticks
sns.set_palette("Set2") # color palette
# Or use a context for font scaling
sns.set_context("talk") # options: paper, notebook (default), talk, poster
"Set2" or "tab10" for categorical data. Use "coolwarm" or "RdYlGn" for diverging data. Use "Blues" or "viridis" for sequential data. Avoid red/green combinations for accessibility.
Which Chart for Which Data?
Choosing the right chart is more important than making it pretty. Here is a decision table.
| Your Question | Data Shape | Chart Type | Code |
|---|---|---|---|
| How does X change over time? | Numeric over time | Line chart | ax.plot() |
| Which category is biggest? | Categories vs values | Bar chart | ax.bar() |
| Are X and Y related? | Two numeric variables | Scatter plot | ax.scatter() |
| What does the distribution look like? | One numeric variable | Histogram | ax.hist() / sns.histplot() |
| How do groups compare (with outliers)? | Groups of numeric values | Box plot | sns.boxplot() |
| What correlates with what? | Many numeric columns | Heatmap | sns.heatmap() |
| What fraction is each part? | 3-5 categories, proportions | Pie / Donut | ax.pie() |
| Explore all relationships at once? | Multi-column DataFrame | Pair plot | sns.pairplot() |
Subplots for Dashboards
Combine multiple charts into a single figure to tell a multi-faceted story.
# 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Top-left: line chart
axes[0, 0].plot(months, revenue, marker="o", color="#6366f1")
axes[0, 0].set_title("Monthly Revenue")
# Top-right: bar chart
axes[0, 1].bar(departments, headcount, color="#10b981")
axes[0, 1].set_title("Headcount")
# Bottom-left: histogram
axes[1, 0].hist(salaries, bins=25, color="#f59e0b", edgecolor="white")
axes[1, 0].set_title("Salary Distribution")
# Bottom-right: scatter
axes[1, 1].scatter(hours_studied, test_scores, alpha=0.6, color="#ef4444")
axes[1, 1].set_title("Study vs Score")
fig.suptitle("Company Dashboard", fontsize=16, fontweight="bold")
plt.tight_layout()
plt.show()
plt.tight_layout() or fig.tight_layout() to avoid overlapping labels. For more control, use fig.subplots_adjust() or GridSpec.
Saving Figures
# Save as PNG (default 100 DPI)
fig.savefig("chart.png", dpi=150, bbox_inches="tight")
# Save as SVG (vector, scales perfectly)
fig.savefig("chart.svg", bbox_inches="tight")
# Save as PDF (great for reports)
fig.savefig("chart.pdf", bbox_inches="tight")
# Transparent background (useful for presentations)
fig.savefig("chart.png", dpi=150, transparent=True, bbox_inches="tight")
Common Mistakes
plt.tight_layout(). Labels get cut off. Always call it before show() or savefig().
Test Yourself
Q: What is the difference between plt.plot() and the object-oriented approach fig, ax = plt.subplots()?
plt.plot() is a stateful shortcut that acts on the "current" axes. The OO approach (fig, ax) explicitly creates a Figure and Axes object, giving you full control. The OO approach is required for subplots, multiple axes, and is the recommended style for all non-trivial charts.Q: When should you use a scatter plot vs a line chart?
Q: How do you show distributions of a variable broken down by category in Seaborn?
sns.histplot(data=df, x="value", hue="category") for overlapping histograms. (2) sns.boxplot(data=df, x="category", y="value") for summary statistics with outliers. (3) sns.violinplot() for shape + density. (4) sns.kdeplot() for smooth density curves.Q: What does sns.heatmap(corr, annot=True, center=0) do?
annot=True writes the correlation values in each cell. center=0 makes the color scale symmetric around zero, so positive correlations are one color and negative another, with white at zero. This makes it easy to spot strong positive and negative relationships at a glance.Q: Name 3 things you should check before presenting a chart to stakeholders.
Interview Questions
Q: You are given a dataset with sales by region and quarter. How would you visualize it to compare both across regions and across time?
sns.catplot(kind="bar", x="quarter", hue="region") works well.Q: What is the difference between Matplotlib and Seaborn? When would you choose one over the other?
Q: How would you customize a Matplotlib chart to match your company's brand colors and fonts?
brand_colors = ["#1a73e8", "#34a853", ...]. (2) Set them globally: plt.rcParams["axes.prop_cycle"] = plt.cycler(color=brand_colors). (3) Set fonts: plt.rcParams["font.family"] = "Arial". (4) Alternatively, create a .mplstyle file and load it with plt.style.use("my_brand.mplstyle"). This ensures every chart in a notebook matches the brand.Q: A stakeholder says your chart is "misleading." What are the most common ways charts mislead, and how do you avoid them?