AWS Fundamentals Interview Questions
35+ AWS interview questions organized by topic. Click "Show Answer" to reveal detailed answers. Covers general AWS, IAM & security, compute, storage, networking, managed services, and architecture patterns.
General AWS
Q: What is AWS and why do companies use it instead of on-premises servers?
AWS is Amazon's cloud platform offering 200+ services. Companies use it for: cost savings (pay-as-you-go vs. upfront CapEx), elasticity (scale up/down in seconds), global reach (30+ Regions), speed of innovation (provision resources in minutes vs. weeks), and managed services that reduce operational burden.
Q: Explain the difference between a Region, an Availability Zone, and an Edge Location.
A Region is a geographic area (e.g., us-east-1) with 2-6 AZs. An Availability Zone is an isolated data center within a Region, connected by low-latency links. An Edge Location is a CloudFront/Route 53 cache point (400+ worldwide) that serves content to nearby users — not a full AWS data center.
Q: What is the AWS Shared Responsibility Model?
AWS is responsible for security OF the cloud — physical data centers, hardware, networking, hypervisor. You are responsible for security IN the cloud — your data, IAM configuration, OS patching (EC2), firewall rules, encryption settings. The split varies by service: more managed services = less your responsibility.
Q: What is the AWS Free Tier and what are its limits?
The Free Tier has 3 types: 12-month free (750 hrs/mo EC2 t2.micro, 5 GB S3, 750 hrs RDS t2.micro), Always free (1M Lambda requests/mo, 25 GB DynamoDB, 1M SNS publishes), and Trials (short-term free for specific services). It's per-account, not per-service — exceeding limits incurs charges immediately.
Q: How do you choose which AWS Region to deploy in?
Consider: (1) Latency — closest to your users, (2) Compliance — data residency laws (e.g., GDPR requires EU), (3) Service availability — not all services are in every Region, (4) Cost — pricing varies by Region (us-east-1 is usually cheapest), (5) Disaster recovery — secondary Region for failover.
IAM & Security
Q: What is the difference between an IAM User, Group, Role, and Policy?
User = identity with permanent credentials (person or service). Group = collection of users sharing the same policies. Role = identity with temporary credentials, assumed by users/services/accounts. Policy = JSON document defining Allow/Deny rules attached to any of the above.
Q: Why should you never use the root account for daily operations?
The root account has unrestricted access — it can delete the entire account, change billing, and bypass all IAM policies. If compromised, total loss. Best practice: enable MFA, create an IAM admin user, lock away root credentials, and only use root for account-level tasks (changing billing, closing the account).
Q: Explain how IAM policy evaluation works when Allow and Deny conflict.
Evaluation order: (1) Start with implicit deny (everything denied by default). (2) Evaluate all applicable policies. (3) Any explicit Deny wins immediately — overrides all Allows. (4) If an Allow exists with no Deny, access granted. (5) No matching statement = implicit deny. Key: Explicit Deny > Allow > Implicit Deny.
Q: When should you use an IAM Role instead of an IAM User?
Use Roles for: EC2 instances needing AWS access (no hardcoded keys), Lambda functions, cross-account access, federation (SSO users), and any service-to-service communication. Roles provide temporary credentials that auto-rotate, which is more secure than long-lived access keys.
Q: What is a Security Group and how does it differ from a NACL?
Security Group: instance-level, stateful (return traffic auto-allowed), allow-only rules, all rules evaluated. NACL: subnet-level, stateless (must allow both directions), allow+deny rules, evaluated in order. Use SGs as your primary firewall; use NACLs as an additional defense layer.
Q: How would you implement least-privilege access for a development team?
Create IAM Groups per role (developers, ops, read-only). Attach managed policies scoped to specific services and actions. Use Permission Boundaries to cap maximum permissions. Enable AWS CloudTrail to audit all API calls. Review with IAM Access Analyzer to find overly permissive policies. Start restrictive and add permissions as needed.
Compute (EC2, Lambda)
Q: What are the EC2 instance purchasing options and when would you use each?
On-Demand: unpredictable workloads, dev/test. Reserved (1-3yr): steady-state production, up to 72% savings. Savings Plans: flexible commitment across instance families. Spot: fault-tolerant batch jobs, up to 90% savings but can be interrupted. Dedicated Hosts: compliance/licensing requiring physical server control.
Q: Explain the difference between vertical and horizontal scaling on AWS.
Vertical scaling = changing instance size (t3.micro → t3.large). Requires downtime, has an upper limit. Horizontal scaling = adding more instances behind a load balancer using Auto Scaling Groups. No downtime, theoretically unlimited. AWS is designed for horizontal scaling — use ALB + ASG for stateless apps, add read replicas for databases.
Q: What is an AMI and why is it important?
An Amazon Machine Image (AMI) is a template containing the OS, application server, and applications needed to launch an instance. Important because: (1) reproducibility — launch identical instances every time, (2) golden image pattern — bake your app into an AMI for faster Auto Scaling launches, (3) disaster recovery — copy AMIs across Regions.
Q: When would you choose Lambda over EC2?
Lambda: event-driven, short tasks (<15 min), unpredictable traffic, no server management wanted, pay-per-invocation. EC2: long-running processes, specific OS needs, GPU workloads, predictable high-traffic (Reserved Instances are cheaper), stateful applications. Lambda scales instantly; EC2 gives more control.
Q: What is a cold start in Lambda and how do you mitigate it?
A cold start happens when Lambda creates a new execution environment — downloading code, initializing runtime, running init code. Adds 100ms-10s latency. Mitigate with: Provisioned Concurrency (pre-warms instances), smaller deployment packages, avoid VPC (or use VPC with Hyperplane), use lighter runtimes (Python/Node over Java), keep initialization code minimal.
Storage (S3, EBS, EFS)
Q: Explain S3 storage classes and when to use each.
Standard: frequently accessed data (apps, websites). Intelligent-Tiering: unknown/changing patterns (auto-optimizes). Standard-IA: infrequent but needs immediate access (backups). One Zone-IA: infrequent, non-critical (reproducible data). Glacier Instant: rare access, immediate retrieval (compliance). Glacier Flexible: rare, hours to retrieve. Glacier Deep Archive: rarely accessed, cheapest ($0.001/GB/mo).
Q: What is the difference between EBS, EFS, and S3?
EBS: block storage for one EC2 instance (like a hard drive), AZ-specific, fastest. EFS: network file system shared by multiple EC2 instances (NFS), Regional, auto-scaling. S3: object storage via HTTP API, unlimited, cheapest, 11 nines durability. Use EBS for databases, EFS for shared files, S3 for data lakes/backups/static hosting.
Q: How does S3 achieve 99.999999999% durability?
S3 automatically replicates every object across at least 3 Availability Zones within a Region. Each AZ has independent infrastructure. Additionally, S3 performs integrity checks using checksums and automatically repairs any detected corruption. This is built-in — no configuration needed.
Q: How would you secure an S3 bucket that stores sensitive data?
(1) Block public access (account and bucket level). (2) Bucket policy restricting access to specific IAM roles/accounts. (3) Server-side encryption (SSE-S3 or SSE-KMS for key management). (4) Enable versioning (protect against accidental deletes). (5) Enable access logging to audit who accessed what. (6) VPC Endpoint so traffic never leaves AWS network. (7) Object Lock for regulatory compliance (WORM).
Q: What is S3 Transfer Acceleration and when would you use it?
S3 Transfer Acceleration uses CloudFront edge locations to speed up uploads. Users upload to the nearest edge location, then data travels over AWS's optimized backbone to the S3 bucket's Region. Use it when users are geographically far from your bucket's Region and need faster uploads. Adds ~$0.04/GB but can improve speed 50-500% for long-distance transfers.
Networking (VPC, Route 53)
Q: Walk through the architecture of a highly available VPC for a production app.
VPC with /16 CIDR across 2+ AZs. Each AZ has: public subnet (ALB, NAT Gateway, bastion host), private app subnet (EC2/ECS), private data subnet (RDS, ElastiCache). Internet Gateway for public subnets. NAT Gateway per AZ for outbound from private subnets. Route tables isolating each tier. Security groups allowing only necessary traffic between tiers.
Q: What is a VPC Endpoint and why would you use one?
A VPC Endpoint allows private connectivity from your VPC to AWS services without using the public internet. Two types: Gateway endpoints (S3, DynamoDB — free) and Interface endpoints (most other services — uses ENI with private IP). Benefits: improved security (no internet exposure), lower latency, and reduced data transfer costs.
Q: Explain the difference between an Internet Gateway, NAT Gateway, and VPN Gateway.
Internet Gateway: connects public subnets to the internet (bidirectional). NAT Gateway: allows private subnets to reach the internet outbound only (instances can't be reached from outside). VPN Gateway: connects your VPC to an on-premises network via encrypted VPN tunnel over the public internet.
Q: How does Route 53 failover routing work?
Create two records: primary (points to main resource) and secondary (points to backup). Associate health checks with the primary. Route 53 periodically checks the primary's health endpoint. If it fails (returns non-200 or timeout), Route 53 automatically switches DNS to the secondary. Recovery is automatic when the primary becomes healthy again.
Managed Services
Q: When would you choose DynamoDB over RDS?
DynamoDB: key-value/document data, single-digit ms latency at any scale, massive write throughput, simple access patterns, serverless (no capacity management). RDS: complex queries with JOINs, ACID transactions across tables, existing SQL applications, reporting/analytics queries. DynamoDB trades query flexibility for infinite scale.
Q: What is Amazon Aurora and why would you choose it over standard RDS?
Aurora is AWS's proprietary database engine, compatible with PostgreSQL and MySQL but up to 5x faster MySQL / 3x faster PostgreSQL. It auto-scales storage to 128 TB, replicates 6 copies across 3 AZs, supports up to 15 read replicas with <10ms lag, and has automated failover in ~30 seconds. Choose Aurora for production workloads needing high availability and performance.
Q: Explain the SQS Dead Letter Queue (DLQ) pattern.
When a message in an SQS queue fails to be processed after a configured number of retries (maxReceiveCount), it's moved to a Dead Letter Queue — a separate SQS queue. This prevents poison messages from blocking the main queue. You set up CloudWatch alarms on the DLQ message count to alert on failures. Engineers then inspect DLQ messages, fix the issue, and replay them.
Q: How do you implement Infrastructure as Code on AWS?
CloudFormation: native AWS, YAML/JSON templates, full AWS service support. AWS CDK: write infrastructure in TypeScript/Python/Java, compiles to CloudFormation. Terraform: multi-cloud, HCL syntax, large community. Pulumi: like CDK but multi-cloud. Store templates in Git, deploy via CI/CD (CodePipeline or GitHub Actions), use change sets/plan to preview changes before applying.
Q: What is ElastiCache and when would you add it to your architecture?
ElastiCache is managed Redis or Memcached for in-memory caching. Add it when: database reads are high and repetitive (cache query results), you need sub-millisecond response times (session storage, leaderboards), or your database CPU is high from read traffic. Typical pattern: check cache first → if miss, query database → store result in cache with TTL.
Architecture & Best Practices
Q: What are the five pillars of the AWS Well-Architected Framework?
(1) Operational Excellence — automate operations, respond to events, learn from failures. (2) Security — protect data, systems, assets; least privilege. (3) Reliability — recover from failures, scale to meet demand. (4) Performance Efficiency — use resources efficiently, experiment with new services. (5) Cost Optimization — avoid unnecessary costs, analyze spending. (Bonus 6th: Sustainability — minimize environmental impact.)
Q: Design a highly available web application architecture on AWS.
Route 53 → CloudFront (CDN) → ALB (Application Load Balancer, multi-AZ) → EC2 Auto Scaling Group (min 2, across 2+ AZs) → Aurora Multi-AZ (writer + reader) with ElastiCache for caching. Static assets on S3 + CloudFront. Logs to CloudWatch. Alarms via SNS. Infrastructure defined in CloudFormation/CDK. All in private subnets except ALB.
Q: How would you migrate a monolithic on-premises application to AWS?
The "6 Rs": Rehost (lift-and-shift to EC2), Replatform (minor optimizations — e.g., move DB to RDS), Refactor (re-architect for cloud-native — microservices, Lambda, containers), Repurchase (switch to SaaS), Retire (decommission unused), Retain (keep on-prem). Start with Rehost for quick migration, then iteratively Replatform/Refactor. Use AWS Migration Hub to track progress.
Q: How do you optimize AWS costs for a production workload?
(1) Right-size instances using Compute Optimizer recommendations. (2) Reserved Instances / Savings Plans for steady-state workloads (up to 72% savings). (3) Spot Instances for fault-tolerant batch jobs. (4) S3 Lifecycle Rules to move data to cheaper tiers. (5) Stop/terminate unused resources (use AWS Trusted Advisor). (6) Auto Scaling to match capacity to demand. (7) Tag everything for cost allocation.
Q: What is the difference between disaster recovery strategies: Backup/Restore, Pilot Light, Warm Standby, and Multi-Site?
Backup/Restore: cheapest, slowest (hours). Back up to S3, restore when needed. Pilot Light: core services running at minimum (DB replica), scale up on disaster (minutes-hours). Warm Standby: scaled-down copy always running, scale up quickly (minutes). Multi-Site Active/Active: full copy in another Region, instant failover, most expensive. Choose based on RTO/RPO requirements and budget.