AWS Managed Services

TL;DR

Managed services let AWS handle patching, backups, scaling, and availability. RDS = managed relational databases. DynamoDB = serverless NoSQL. SQS/SNS = messaging and notifications. CloudWatch = monitoring and alerts. CloudFormation = infrastructure as code.

Explain Like I'm 12

Instead of cooking every meal from scratch, you can order from a restaurant that handles shopping, cooking, and cleaning. RDS is like ordering a database meal — you pick the type (PostgreSQL, MySQL), and AWS cooks it, keeps it fresh, and makes backups. SQS is a to-do list that multiple workers can pull tasks from. CloudWatch is the security camera system watching everything in your AWS house.

Managed Services Ecosystem

AWS managed services ecosystem showing RDS, DynamoDB, SQS, SNS, CloudWatch, and CloudFormation with their interactions

RDS — Relational Database Service

RDS manages relational databases so you don't have to handle installation, patching, backups, or replication. It supports PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Amazon Aurora.

RDS vs. Self-Managed on EC2

ResponsibilityRDS (Managed)EC2 (Self-Managed)
OS patchingAWS handles itYou do it
Automated backupsBuilt-in (35-day retention)You script it
Multi-AZ failoverOne checkboxYou build replication
Read replicasCLI commandYou configure replication
ScalingModify instance classManual migration
Custom DB extensionsLimitedFull control
# Create a PostgreSQL RDS instance
aws rds create-db-instance \
  --db-instance-identifier my-postgres \
  --db-instance-class db.t3.micro \
  --engine postgres \
  --engine-version 16.1 \
  --master-username admin \
  --master-user-password MySecurePass123! \
  --allocated-storage 20 \
  --multi-az \
  --backup-retention-period 7

# Create a read replica for scaling reads
aws rds create-db-instance-read-replica \
  --db-instance-identifier my-postgres-replica \
  --source-db-instance-identifier my-postgres
Tip: Amazon Aurora is AWS's proprietary engine — compatible with PostgreSQL and MySQL but up to 5x faster, with automatic storage scaling up to 128 TB. It costs more but is worth it for production workloads.

DynamoDB — Serverless NoSQL

DynamoDB is a fully managed NoSQL database with single-digit millisecond latency at any scale. No servers to manage, no capacity planning — it scales automatically.

Key Concepts

  • Table — Collection of items (like a SQL table)
  • Partition Key — Primary key used for data distribution (must be unique or combined with sort key)
  • Sort Key — Optional secondary key for range queries within a partition
  • GSI (Global Secondary Index) — Query by different attributes
# Create a DynamoDB table
aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions \
    AttributeName=customerId,AttributeType=S \
    AttributeName=orderId,AttributeType=S \
  --key-schema \
    AttributeName=customerId,KeyType=HASH \
    AttributeName=orderId,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

# Put an item
aws dynamodb put-item --table-name Orders \
  --item '{
    "customerId": {"S": "CUST-001"},
    "orderId": {"S": "ORD-2026-0001"},
    "total": {"N": "59.99"},
    "status": {"S": "shipped"}
  }'

# Query all orders for a customer
aws dynamodb query --table-name Orders \
  --key-condition-expression "customerId = :cid" \
  --expression-attribute-values '{":cid": {"S": "CUST-001"}}'
Info: DynamoDB pricing: On-Demand (pay per read/write, no planning) or Provisioned (set read/write capacity units, cheaper if traffic is predictable). Start with On-Demand, switch to Provisioned when you understand your access patterns.

SQS & SNS — Messaging

These two services decouple your components so they don't need to communicate directly.

SQS vs. SNS

FeatureSQS (Queue)SNS (Pub/Sub)
PatternPull — consumers poll for messagesPush — subscribers receive immediately
ConsumersOne consumer processes each messageMultiple subscribers get every message
RetentionUp to 14 daysNo retention (deliver or lose)
Use caseTask queues, order processingNotifications, fan-out to multiple services
# Create an SQS queue
QUEUE_URL=$(aws sqs create-queue --queue-name order-processing \
  --query 'QueueUrl' --output text)

# Send a message
aws sqs send-message --queue-url $QUEUE_URL \
  --message-body '{"orderId": "ORD-001", "action": "process"}'

# Receive and delete a message
MSG=$(aws sqs receive-message --queue-url $QUEUE_URL \
  --query 'Messages[0]' --output json)
# ... process the message ...
aws sqs delete-message --queue-url $QUEUE_URL \
  --receipt-handle $(echo $MSG | jq -r '.ReceiptHandle')

# Create an SNS topic and subscribe email
TOPIC_ARN=$(aws sns create-topic --name alerts \
  --query 'TopicArn' --output text)
aws sns subscribe --topic-arn $TOPIC_ARN \
  --protocol email --notification-endpoint [email protected]
Tip: A common pattern is SNS → SQS fan-out: publish one event to SNS, and multiple SQS queues receive a copy. This lets different services process the same event independently.

CloudWatch — Monitoring & Observability

CloudWatch collects metrics, logs, and events from all AWS services. It's your single pane of glass for monitoring.

  • Metrics — CPU, memory, request count, error rate (auto-collected for most services)
  • Alarms — Trigger actions when metrics cross thresholds (e.g., auto-scale, send SNS alert)
  • Logs — Centralized log storage from EC2, Lambda, containers, etc.
  • Dashboards — Custom visualizations of your metrics
# Create an alarm: alert if CPU > 80% for 5 minutes
aws cloudwatch put-metric-alarm \
  --alarm-name high-cpu-alarm \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

# Query logs
aws logs filter-log-events \
  --log-group-name /aws/lambda/processUpload \
  --filter-pattern "ERROR"
Warning: CloudWatch Logs can get expensive at scale. Set retention policies (e.g., 30 days) to auto-delete old logs. For long-term storage, export to S3.

CloudFormation — Infrastructure as Code

CloudFormation lets you define your entire AWS infrastructure in YAML or JSON templates. Instead of clicking through the console, you version-control your infrastructure like application code.

# template.yaml — Create a VPC + EC2 instance
AWSTemplateFormatVersion: '2010-09-09'
Description: Simple web server stack

Resources:
  WebServerInstance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t3.micro
      ImageId: ami-0c02fb55956c7d316
      SecurityGroupIds:
        - !Ref WebServerSG
      Tags:
        - Key: Name
          Value: my-web-server

  WebServerSG:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow HTTP
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0

Outputs:
  InstanceId:
    Value: !Ref WebServerInstance
  PublicIP:
    Value: !GetAtt WebServerInstance.PublicIp
# Deploy the stack
aws cloudformation create-stack \
  --stack-name my-web-stack \
  --template-body file://template.yaml

# Update an existing stack
aws cloudformation update-stack \
  --stack-name my-web-stack \
  --template-body file://template.yaml

# Delete everything created by the stack
aws cloudformation delete-stack --stack-name my-web-stack
Info: AWS CDK (Cloud Development Kit) lets you write infrastructure in Python, TypeScript, Java, or Go instead of YAML. It compiles down to CloudFormation templates. Many teams prefer CDK for complex infrastructure.

Test Yourself

When would you choose DynamoDB over RDS?

Choose DynamoDB for: key-value lookups, single-digit millisecond latency at any scale, unpredictable/bursty traffic, simple access patterns (get by ID, query by partition key). Choose RDS for: complex queries with JOINs, transactions across multiple tables, existing SQL-based applications, data that's heavily relational. DynamoDB excels at scale; RDS excels at flexibility.

Explain the SNS → SQS fan-out pattern. Why is it useful?

One event is published to an SNS topic. Multiple SQS queues are subscribed to that topic. Each queue gets a copy of the message. Different consumers process each queue independently. This is useful because: (1) services are decoupled — adding a new consumer just means subscribing a new queue, (2) each consumer processes at its own pace, (3) if one consumer fails, others are unaffected.

What happens when you delete a CloudFormation stack?

CloudFormation deletes all resources it created in reverse dependency order. EC2 instances are terminated, security groups are removed, S3 buckets are deleted (only if empty). You can protect critical resources with DeletionPolicy: Retain in the template to keep them even when the stack is deleted. RDS instances default to creating a final snapshot.

Your Lambda function writes to DynamoDB but sometimes fails. How do you prevent data loss?

Use an SQS queue as a buffer between the event source and Lambda. Configure the SQS queue as Lambda's event source with a Dead Letter Queue (DLQ). If Lambda fails to process a message after the maximum retry count, it moves to the DLQ for investigation. This ensures no events are lost. Additionally, configure CloudWatch Alarms on the DLQ message count to alert when failures occur.

How would you set up monitoring for a production application on AWS?

Layer your observability: (1) CloudWatch Metrics for CPU, memory, request count, error rates with alarms on thresholds. (2) CloudWatch Logs with structured logging (JSON) and metric filters for application errors. (3) X-Ray for distributed tracing across services. (4) CloudWatch Dashboards for real-time visibility. (5) SNS alerts to Slack/PagerDuty when alarms fire. (6) AWS Health Dashboard for AWS service outages.

Interview Questions

Design an event-driven order processing system using AWS managed services.

API Gateway receives orders → Lambda validates and writes to DynamoDB → DynamoDB Streams triggers another Lambda for fulfillment → publishes to SNS "order-events" topic → SQS queues fan out to: inventory service, notification service (sends email via SES), analytics pipeline (writes to S3 via Kinesis Firehose). CloudWatch monitors all Lambda errors, DynamoDB throttles, and SQS DLQ depths. CloudFormation/CDK defines the entire stack as code.

Your RDS PostgreSQL instance is hitting 90% CPU during peak hours. Walk through your optimization strategy.

Step 1: Enable Performance Insights to identify slow queries. Step 2: Optimize queries (add indexes, rewrite joins). Step 3: Add Read Replicas and route read traffic there. Step 4: Add ElastiCache (Redis) for frequently accessed data. Step 5: If still not enough, vertically scale to a larger instance class. Step 6: Consider Aurora with auto-scaling read replicas. Step 7: For write-heavy, consider DynamoDB for high-throughput access patterns.

Why should you use Infrastructure as Code (CloudFormation/CDK) instead of the AWS Console?

Repeatability — deploy the same stack across dev/staging/prod identically. Version control — track every change in Git, review via PRs. Rollback — CloudFormation can revert failed updates automatically. Documentation — the template IS the documentation. Automation — deploy via CI/CD pipelines. Disaster recovery — recreate entire environments from templates. Console clicking is manual, error-prone, and impossible to audit.