+----------------------------------------------------------------------------+ | YOUR APPLICATION | +----------------------------------------------------------------------------+ | +----------------------+ +----------------------+ +----------------------+ | | | Operational | | Security | | Reliability | | | | Excellence | | | | | | | +----------------------+ +----------------------+ +----------------------+ | | +----------------------+ +----------------------+ +----------------------+ | | | Performance | | Cost | | Sustainability | | | | Efficiency | | Optimization | | | | | +----------------------+ +----------------------+ +----------------------+ | +----------------------------------------------------------------------------+
Keyword in scenario | Go-to service | Reason |
---|---|---|
"0.0.0.0/0" + "private subnet" | NAT Gateway/Instance | Private subnets need managed egress, not IGW. |
"millisecond latency reads" + "NoSQL" | DynamoDB + DAX | DAX caches reads for microsecond responses. |
"automatic key rotation" + "audit" | AWS KMS CMK | KMS rotates yearly and logs to CloudTrail. |
"blue/green" + "database" | Aurora clone or RDS Blue/Green | Managed cutover without downtime. |
"global DNS failover" | Route 53 Failover policy | Only policy that swaps to standby on health check. |
"sudden spikes" + "control cost" | Auto Scaling + Spot | Scale out cheaply and terminate when demand drops. |
"serverless" + "event-driven" | Lambda | Pay per execution, scales to zero. |
"task takes 20+ minutes" | ECS Fargate or Batch | Lambda max 15 min timeout. |
"rotate database password" | Secrets Manager | Built-in rotation for RDS, Redshift. |
"who deleted this resource" | CloudTrail | API audit log (WHO did WHAT). |
"is config compliant" | AWS Config | Configuration compliance rules. |
"detect compromised EC2" | GuardDuty | ML-based threat detection. |
"load streaming data to S3" | Kinesis Firehose | Simplest, fully managed. |
"real-time < 1 second" | Kinesis Data Streams | Firehose has 60s+ buffer. |
"serverless containers" | ECS Fargate | No EC2 management needed. |
"infrastructure as code" | CloudFormation | Define infrastructure in templates. |
"deploy app easily, no infra" | Elastic Beanstalk | PaaS for developers. |
"monitor EC2 memory" | CloudWatch Agent | Memory NOT default metric. |
Topic | Number | Why it matters |
---|---|---|
S3 durability | 11 nines (99.999999999%) | Argue for storing mission-critical backups. |
S3 Glacier Deep Archive retrieval | 12-48 hours | Use only when cold archives are acceptable. |
RDS automated backup retention | 7-35 days | Remember to schedule manual snapshots for longer. |
DynamoDB capacity math | 1 WCU = 1 KB/s write, 1 RCU = 4 KB/s strongly consistent read | Needed for throughput sizing questions. |
Route 53 health check interval | 30 s default, 10 s fast | Explains failover detection time. |
CloudFront default TTL | 24 hours | Recognize caching behavior without custom headers. |
ELB cross-zone | On by default for ALB, optional (billable) for NLB | Cost/architecture trade-off question staple. |
Multi-AZ requirement | At least 2 AZs per region | Every HA design answer references two or more AZs. |
~15% of total exam
Network isolation and security
βββββ CRITICAL
+----------------------------------------------------------------------------+ | YOUR VPC (10.0.0.0/16) | | +-----------------------------+ +-----------------------------+ | | | Public Subnet (Internet) | | Private Subnet (Secure) | | | | - Web Servers (EC2) | | - Database (RDS) | | | | - Public ELB | | - Internal Apps (EC2) | | | +-----------------------------+ +-----------------------------+ | +----------------------------------------------------------------------------+ | | (Internet Gateway) --0.0.0.0/0--> (Route Table) --0.0.0.0/0--> (NAT Gateway) | | (SG: Allow 80/443) (SG: Allow 3306 from Web SG)
Feature | Security Groups (SG) | Network ACLs (NACL) |
---|---|---|
Applies to | Instance level (ENI) | Subnet level |
State | Stateful (return traffic is auto-allowed) | Stateless (must explicitly allow return traffic) |
Rules | Allow rules only | Allow AND Deny rules |
Evaluation | All rules are evaluated | Rules evaluated in number order |
Use Case | Instance-level firewall | Subnet-level firewall (first line of defense) |
Scenario | Wrong Answer | Correct Approach | Keyword |
---|---|---|---|
EC2 in private subnet needs to download patches | Move it to a public subnet | Use a NAT Gateway in a public subnet | "private" + "internet access" |
Block a specific malicious IP address | Use a Security Group | Use a NACL with a DENY rule | "block IP" / "deny" |
Connect to S3 securely and cost-effectively | Use a NAT Gateway | Use a Gateway VPC Endpoint for S3 | "private access to S3" |
Connect 10+ VPCs together | Use VPC Peering | Use a Transit Gateway | "simplify" / "scale network" |
~20% of total exam
Provide scalable computing capacity
βββββ CRITICAL
Feature | Application Load Balancer (ALB) | Network Load Balancer (NLB) |
---|---|---|
Layer | Layer 7 (HTTP/HTTPS) | Layer 4 (TCP/UDP) |
Aware of | Requests, paths, headers (e.g., /users, /images) | IP, Port, Protocol |
Use Case | Web applications, microservices, containers | High-performance, low-latency, static IP needed |
Key Feature | Path-based routing, host-based routing | Ultra-high performance, preserves source IP |
User --> Route 53 --> CloudFront --> ALB --> EC2 Auto Scaling Group | (in multiple AZs) v RDS Multi-AZ DB
~15% of total exam
Durable, scalable object storage
βββββ CRITICAL
Class | Use Case | Key Feature |
---|---|---|
S3 Standard | Frequently accessed data | Low latency, high throughput |
S3 Intelligent-Tiering | Unknown or changing access patterns | Automatic cost savings |
S3 Standard-IA | Infrequently accessed, needed quickly | Lower storage cost, retrieval fee |
S3 One Zone-IA | Infrequent, non-critical, reproducible data | Cheapest IA, stored in one AZ |
S3 Glacier Instant Retrieval | Archive data, millisecond access | Fastest archive access |
S3 Glacier Flexible Retrieval | Archive data, minutes to hours access | Flexible retrieval options |
S3 Glacier Deep Archive | Long-term archive, cheapest storage | 12-48 hour retrieval |
~10% of total exam
Manage access to AWS services securely
βββββ CRITICAL
~10% of total exam
Operate and scale a relational database
βββββ CRITICAL
Feature | Multi-AZ Deployment | Read Replicas |
---|---|---|
Purpose | High Availability / Disaster Recovery | Read Scalability / Performance |
Replication | Synchronous to a standby instance in a different AZ | Asynchronous to one or more read-only copies |
Failover | Automatic, DNS endpoint points to standby | Manual promotion to a standalone DB |
Use Case | Production databases that cannot have downtime | Read-heavy applications, reporting, analytics |
~8-10% of total exam
Run code without managing servers
βββββ CRITICAL
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β TRADITIONAL SERVER vs. LAMBDA β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β Rent whole building 24/7 Pay per task β β Manage infrastructure AWS handles servers β β Scale manually Auto-scales β β Idle cost = $$$$ Idle cost = $0 β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ EVENT SOURCES (Triggers): ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β API Gateway ββββββΆβ LAMBDA ββββββΆβ DynamoDB β β (HTTP) β β (Process) β β (Store) β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β² β β β β ββββββββΌβββββββββ ββββββββ΄ββββββββββ ββββββββΌβββββββββ β S3 Events β β EventBridge β β DDB Streams β β (File upload) β β (Scheduled) β β (Changes) β βββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ
Resource | Limit | Exam Relevance |
---|---|---|
Execution Timeout | Max 15 minutes | βββββ For long-running tasks, use ECS/Fargate instead |
Memory | 128 MB - 10 GB | ββββ More memory = more CPU = faster execution |
Deployment Package | 50 MB (zipped), 250 MB (unzipped) | βββ Large dependencies? Use Lambda Layers or Container Images |
Concurrent Executions | 1000 per region (soft limit) | ββββ Use Reserved Concurrency to prevent throttling |
/tmp Storage | 512 MB - 10 GB | βββ Ephemeral storage, cleared after execution |
Environment Variables | 4 KB total | ββ For large configs, use Parameter Store/Secrets Manager |
User Request β API Gateway β Lambda β DynamoDB β CloudFront (optional caching) Use Case: REST API, mobile backend, single-page apps Benefits: Auto-scaling, pay-per-request, no server management Exam Keywords: "serverless", "cost-effective API", "scales to zero"
S3 (Upload) β Lambda (Process) β S3 (Output) β SNS (Notify completion) Use Case: Image thumbnails, video transcoding, data transformation Benefits: Event-driven, parallel processing Exam Keywords: "process uploaded files", "trigger on S3 event"
DynamoDB Streams β Lambda β ElastiCache/S3/Another DDB Kinesis Data Stream β Lambda β Data Lake (S3) Use Case: Real-time analytics, data replication, audit logging Benefits: Batch processing, automatic retries Exam Keywords: "real-time data processing", "react to changes"
EventBridge (Cron) β Lambda β Task (backup, cleanup, reports) Use Case: Scheduled backups, daily reports, cleanup jobs Benefits: No server to maintain, runs only when needed Exam Keywords: "scheduled task", "cron job", "run daily/hourly"
Trap | Wrong Answer | Correct Answer |
---|---|---|
Task needs 20 minutes | Use Lambda with 20 min timeout | Lambda max 15 min! Use ECS Fargate or Batch |
Lambda needs database access | Put connection string in code | Use environment variables + Secrets Manager |
High concurrent requests throttling | Increase memory | Increase Reserved Concurrency or request limit increase |
Lambda accessing RDS runs out of connections | Increase Lambda memory | Use RDS Proxy to pool connections |
Large deployment package (300 MB) | Split into multiple functions | Use Lambda Layers or Container Images (10 GB limit) |
Lambda needs internet + VPC resources | Configure VPC only | VPC Lambda needs NAT Gateway for internet access |
~6-8% of total exam
Monitor, collect, analyze AWS resources & applications
ββββ IMPORTANT
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β CLOUDWATCH ECOSYSTEM β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β πΉ CloudWatch LOGS π CloudWatch METRICS β β (What happened?) (How much/many?) β β - Application logs - CPU, Memory, Network β β - System logs - Custom business metrics β β - VPC Flow Logs - Auto Scaling triggers β β β β π¨ CloudWatch ALARMS β° EventBridge (CloudWatch Events)β β (Alert me when...) (Do something when...) β β - Threshold breached - Schedule (cron) β β - SNS notification - Trigger Lambda β β - Auto Scaling action - Event patterns β β β β π CloudWatch DASHBOARDS π CloudWatch Insights β β (Visualization) (Query & analyze logs) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20
{ "source": ["aws.ec2"], "detail-type": ["EC2 Instance State-change Notification"], "detail": { "state": ["terminated"] } }
Application β Custom Metric (Active Users) β CloudWatch Alarm β Auto Scaling Policy β Scale EC2/ECS tasks Use Case: Scale based on business metrics, not just CPU Exam Keywords: "scale based on queue length", "active connections"
Multiple EC2/ECS β CloudWatch Logs β CloudWatch Logs Insights (Query) β S3 (Archive, Athena analysis) Use Case: Aggregate logs from multiple sources, long-term storage Exam Keywords: "centralized logging", "log aggregation", "analyze logs"
AWS Event β EventBridge β Lambda β Remediation Action (e.g., EC2 stopped β detect β restart instance) Use Case: Auto-recover from failures, compliance enforcement Exam Keywords: "automated response", "self-healing", "compliance automation"
EventBridge (Cron: 0 2 * * ? *) β Lambda β Backup/Cleanup Task Use Case: Daily backups, weekly reports, monthly cleanup Exam Keywords: "scheduled", "run every day/week", "cron job"
Service | Purpose | What it tracks | Use Case |
---|---|---|---|
CloudWatch | Performance monitoring | Metrics, logs, alarms | "Is my app healthy?" "High CPU usage" |
CloudTrail | Audit logging (WHO did WHAT) | API calls, user activity | "Who deleted this S3 bucket?" "Compliance audit" |
Config | Configuration compliance | Resource config changes over time | "Is my security group compliant?" "Config drift" |
Trap | Wrong Answer | Correct Answer |
---|---|---|
Monitor EC2 memory usage | CloudWatch default metrics | CloudWatch Agent (memory NOT default) |
Long-term log storage (years) | Keep in CloudWatch Logs | Export to S3 (much cheaper) |
Query logs across multiple sources | Manual log download | CloudWatch Logs Insights |
React to EC2 state change | Poll EC2 API | EventBridge event pattern |
Schedule Lambda every hour | CloudWatch Alarms | EventBridge scheduled rule |
Reduce alarm noise (too many alerts) | Increase threshold | Composite Alarms (combine with AND/OR logic) |
~8-10% of total exam
Encryption, secrets management, audit, compliance
βββββ CRITICAL
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β SECURITY LAYERS β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β π LAYER 1: ENCRYPTION (Data Protection) β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β KMS β Manages encryption keys β β β β CloudHSM β Hardware security module for compliance β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β π LAYER 2: SECRETS & CREDENTIALS β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Secrets Manager β Rotate DB passwords, API keys β β β β Parameter Store β Store config, lightweight secrets β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β π LAYER 3: AUDIT & COMPLIANCE β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β CloudTrail β WHO did WHAT (API audit) β β β β Config β Resource config compliance β β β β GuardDuty β Threat detection (ML-based) β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β π‘οΈ LAYER 4: NETWORK PROTECTION β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β WAF β Web app firewall (SQL injection, XSS) β β β β Shield β DDoS protection β β β β Firewall Manager β Centralized firewall rules β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
WHY ENVELOPE ENCRYPTION? - Encrypting large data with KMS is slow (4KB limit per API call) - Solution: Use data key to encrypt data, use master key to encrypt data key PROCESS: 1. Call KMS GenerateDataKey β Get plaintext + encrypted data key 2. Use plaintext data key to encrypt your file locally 3. Store encrypted file + encrypted data key together 4. Delete plaintext data key from memory DECRYPTION: 1. Call KMS Decrypt with encrypted data key β Get plaintext data key 2. Use plaintext data key to decrypt file 3. Delete plaintext data key from memory βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β KMS Master Key (never leaves KMS) β β β β β ββββΆ Encrypts Data Key β β β β β Data Key (plaintext) βββΆ Encrypts actual data β β β β Stored: Encrypted Data + Encrypted Data Key β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Service | Encryption Options | Exam Tip |
---|---|---|
S3 | SSE-S3, SSE-KMS, SSE-C | SSE-KMS for audit trail + key control |
EBS | Encrypted with KMS | Encrypted EBS β Encrypted snapshots (different regions need re-encryption) |
RDS | Encrypt at rest with KMS | Can't encrypt existing DB, must snapshot β restore to encrypted |
Lambda | Environment variables encrypted with KMS | Default or custom CMK |
DynamoDB | Encryption at rest (default AWS managed, optional CMK) | CMK for compliance + audit |
Feature | Secrets Manager | Systems Manager Parameter Store |
---|---|---|
Purpose | Store & rotate secrets | Store config & secrets |
Pricing | $0.40/secret/month + $0.05/10k API calls | Free (Standard), $0.05/parameter/month (Advanced) |
Automatic Rotation | β Built-in for RDS, Redshift, DocumentDB | β Manual only |
Cross-account Access | β Via resource policy | β Not supported |
Size Limit | 64 KB | 4 KB (Standard), 8 KB (Advanced) |
Encryption | Always encrypted with KMS | Optional KMS encryption (SecureString) |
Versioning | β Automatic | β Yes |
Best For | Database passwords, API keys (need rotation) | App config, feature flags, static secrets |
Question: Do you need AUTOMATIC ROTATION? β ββ YES β Secrets Manager β ββ Examples: RDS passwords, 3rd party API keys β ββ NO β Do you need to store secrets? β ββ YES β Parameter Store (SecureString) β ββ Examples: Static passwords, connection strings β ββ NO (just config) β Parameter Store (String) ββ Examples: App settings, feature flags, URLs
Secrets Manager (RDS password, auto-rotate every 30 days) β Lambda function (retrieves secret at runtime) β RDS (connects with fresh password) Exam Keywords: "rotate database password", "Lambda + RDS security"
Account A: Secrets Manager secret β (resource policy allows Account B) Account B: Lambda function retrieves secret Exam Keywords: "cross-account secret access", "central secrets management"
Question | Answer |
---|---|
Who deleted this S3 bucket? | CloudTrail (API audit) |
Is my security group compliant with our policy? | Config (compliance rules) |
Detect if EC2 instance is compromised | GuardDuty (threat detection) |
Protect web app from SQL injection | WAF (web firewall) |
Rotate RDS password automatically | Secrets Manager |
Encrypt data at rest | KMS (encryption keys) |
Audit all KMS key usage | CloudTrail (KMS is integrated) |
Store app config (non-sensitive) | Parameter Store |
Trap | Wrong Answer | Correct Answer |
---|---|---|
Rotate database password automatically | Parameter Store | Secrets Manager (has built-in rotation) |
Encrypt existing RDS database | Enable encryption on existing DB | Can't encrypt in-place! Snapshot β Restore encrypted |
Audit who accessed S3 objects | Config | CloudTrail Data Events + S3 Server Access Logging |
Cross-account secret sharing | Parameter Store | Secrets Manager (supports resource policy) |
Encrypt large file (1 GB) with KMS | Call KMS Encrypt directly | Use Envelope Encryption (GenerateDataKey) |
Detect compromised EC2 instance | CloudWatch Alarms | GuardDuty (ML-based threat detection) |
Store 10 KB secret | Secrets Manager | Both work, but Parameter Store is free (Advanced tier) |
Need to store credentials? ββ Need rotation? β Secrets Manager ββ No rotation? β Parameter Store Need audit trail? ββ WHO did WHAT? β CloudTrail ββ Is config compliant? β Config Need to detect threats? β GuardDuty Need to protect web app? β WAF
Feature | SQS (Simple Queue Service) | SNS (Simple Notification Service) |
---|---|---|
Model | Queue (Pull-based) | Topic (Push-based) |
Communication | One-to-one. A message is processed by one consumer. | One-to-many (Fan-out). A message is sent to all subscribers. |
Use Case | Decouple applications, buffer requests, throttle workloads. | Send notifications, trigger parallel processing. |
Use SNS to send a single message to multiple SQS queues, allowing different parts of your application to process the same event in parallel, reliably.
Event Source --> SNS Topic --> SQS Queue A --> Processor A \--> SQS Queue B --> Processor B \--> SQS Queue C --> Processor C
~5-7% of total exam
Define infrastructure as code (IaC)
ββββ IMPORTANT
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β TRADITIONAL (Manual) vs. CloudFormation (IaC) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β Click in console Write template (YAML/JSON) β β Manual steps Automated deployment β β Hard to replicate Version controlled β β Prone to errors Consistent & repeatable β β No audit trail Full change history β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ CLOUDFORMATION WORKFLOW: ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β Template βββββββΆβ Stack βββββββΆβ Resources β β (YAML/JSON) β β (Created) β β (Running) β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β ββ Update Stack β Change Set (preview) ββ Delete Stack β All resources deleted ββ Rollback on failure (automatic)
DependsOn
to enforce order.Resources: MyDB: Type: AWS::RDS::DBInstance DependsOn: MySecurityGroup # Wait for SG first Properties: ...
Resources: MyDatabase: Type: AWS::RDS::DBInstance DeletionPolicy: Snapshot # Create snapshot before deleting
Snapshot
to preserve data before replacement.# Stack A (Network Stack) Outputs: VPCId: Value: !Ref MyVPC Export: Name: NetworkStack-VPCID # Stack B (App Stack) Resources: MyInstance: Type: AWS::EC2::Instance Properties: SubnetId: !ImportValue NetworkStack-VPCID
Tool | Purpose | Best For | Exam Relevance |
---|---|---|---|
CloudFormation | Infrastructure as Code | Define entire infrastructure, multi-resource stacks | βββββ |
Elastic Beanstalk | Platform as a Service (PaaS) | Deploy apps quickly without infrastructure knowledge | βββ Know when to use |
SAM (Serverless Application Model) | Serverless IaC (simplified CloudFormation) | Lambda, API Gateway, DynamoDB serverless apps | ββ Aware of existence |
CDK (Cloud Development Kit) | Define infrastructure with programming languages | Developers who prefer code over YAML/JSON | β Mention in passing |
OpsWorks | Configuration management (Chef/Puppet) | Complex app configuration, legacy systems | β Low priority |
CloudFormation: - Full control over infrastructure - Complex multi-tier architectures - Need to manage 10+ AWS resources together - Version control and audit trail required Elastic Beanstalk: - "Just deploy my app, I don't care about infrastructure" - Standard web apps (Node.js, Python, Java, .NET, PHP, Ruby, Go) - Auto-scaling, load balancing handled automatically - Developer-friendly, less control SAM: - Serverless applications (Lambda + API Gateway + DynamoDB) - Simplified syntax for serverless (less boilerplate than CloudFormation) - Local testing with SAM CLI Exam Decision Tree: Question mentions "infrastructure as code" + complex resources β CloudFormation Question mentions "developer wants to deploy app easily" β Elastic Beanstalk Question mentions "serverless" + "simplified template" β SAM
Issue | Cause | Solution |
---|---|---|
Stack stuck in CREATE_IN_PROGRESS | Resource creation timeout or dependency issue | Check CloudFormation Events, verify resource limits (e.g., VPC limit) |
Stack rollback on create | Resource creation failed | Check Events tab for error details, fix template, retry |
UPDATE_ROLLBACK_FAILED | Rollback itself failed (e.g., resource manually deleted) | Use Continue Update Rollback, manually fix issue |
Can't delete stack (DELETE_FAILED) | Resource has dependencies or can't be deleted | Check Events, manually delete problem resource, retry stack delete |
Drift detected | Someone manually changed resources | Import drift (update template) or revert manual change |
Can't update (exports in use) | Another stack imports this stack's exports | Delete dependent stack first, or change export name |
Trap | Wrong Answer | Correct Answer |
---|---|---|
Preserve database on stack deletion | Use Retain on stack | Set DeletionPolicy: Retain or Snapshot on resource |
Preview changes before update | Update stack directly | Create Change Set first, review, then execute |
Deploy to 50 accounts | Run CloudFormation 50 times | Use StackSets for multi-account deployment |
Reuse VPC template across stacks | Copy-paste template code | Use Nested Stacks or Cross-Stack References |
Detect manual infrastructure changes | Manually compare | Use Drift Detection |
Resource replacement will cause downtime | Update immediately | Check Change Set, use blue/green deployment pattern |
Need automation? ββ Full infrastructure control? β CloudFormation ββ Just deploy app easily? β Elastic Beanstalk ββ Serverless app? β SAM (or CloudFormation) CloudFormation features: ββ Preview changes? β Change Set ββ Preserve data on delete? β DeletionPolicy: Retain/Snapshot ββ Multi-account/region? β StackSets ββ Reusable components? β Nested Stacks ββ Detect manual changes? β Drift Detection
~3-5% of total exam
Run Docker containers on AWS
βββ GOOD TO KNOW
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β AWS CONTAINER SERVICES β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β π³ ECS (Elastic Container Service) β β ββ ECS on EC2: You manage EC2 instances β β β ββ More control, can use Reserved Instances/Spot β β ββ ECS on Fargate: Serverless, AWS manages infrastructure β β ββ No EC2 management, pay per task β β β β βΈοΈ EKS (Elastic Kubernetes Service) β β ββ Managed Kubernetes for complex container orchestration β β ββ Use if you need Kubernetes, multi-cloud portability β β β β π¦ ECR (Elastic Container Registry) β β ββ Docker image registry (like Docker Hub) β β ββ Store and manage container images β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ARCHITECTURE: ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ALB (Load Balancer) β β β β β ECS Service (maintains desired count of tasks) β β β β β ECS Tasks (running containers) β β ββ Task 1 (Container A + Container B) β β ββ Task 2 (Container A + Container B) β β ββ Task 3 (Container A + Container B) β β β β β Launch Type: EC2 (your instances) or Fargate (serverless) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Feature | ECS on EC2 | ECS on Fargate |
---|---|---|
Infrastructure | You manage EC2 instances | Serverless, AWS manages |
Pricing | Pay for EC2 instances (RI/Spot available) | Pay per vCPU + memory per second |
Use Case | Need control, optimize cost with RI/Spot, large workloads | Simplicity, no ops, variable workloads |
Scaling | Scale EC2 instances (slower) + tasks | Scale tasks instantly |
Networking | EC2 instance network | Each task has its own ENI (Elastic Network Interface) |
Storage | EBS volumes | Ephemeral (20 GB) or EFS |
Scenario | Use Lambda | Use ECS/Fargate | Use EKS |
---|---|---|---|
Short-lived tasks (< 15 min) | β Perfect fit | β Overkill | β Too complex |
Long-running services (24/7) | β Expensive | β Ideal | β If need K8s |
Need Docker/containers | β οΈ Can use container images | β Native support | β Native support |
Microservices architecture | β Serverless microservices | β Containerized microservices | β Complex orchestration |
Need Kubernetes | β Not supported | β Not Kubernetes | β Managed Kubernetes |
Multi-cloud portability | β AWS-specific | β οΈ Docker portable | β K8s is portable |
Need to run containers? β ββ NO β Use EC2 or Lambda β ββ YES β Do you already use Kubernetes? β ββ YES β EKS (Elastic Kubernetes Service) β ββ NO β Do you want to manage EC2 instances? β ββ YES β ECS on EC2 (cost optimization with RI/Spot) β ββ NO β ECS on Fargate (serverless, simplest)
ALB (Path-based routing) ββ /api/users β ECS Service A (User microservice) ββ /api/orders β ECS Service B (Order microservice) ββ /api/products β ECS Service C (Product microservice) Each service: - ECS Service with Auto Scaling - Fargate tasks in multiple AZs - Connected to RDS/DynamoDB Exam Keywords: "microservices", "containerized", "path-based routing"
S3 Event β EventBridge β ECS Task (Fargate, run once) β Process file β Output to S3 Use Case: Video transcoding, image processing, data transformation Exam Keywords: "batch", "event-driven containers", "run once"
~3-4% of total exam
Collect, process, analyze real-time streaming data
βββ GOOD TO KNOW
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β KINESIS FAMILY β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β π Kinesis Data Streams β β ββ Real-time data streaming (custom processing) β β ββ Retain data 1-365 days β β ββ Manual scaling (shards) β β ββ Use: Custom real-time analytics, complex processing β β β β π° Kinesis Data Firehose β β ββ Load streaming data to destinations β β ββ Near real-time (60s buffer) β β ββ Auto-scaling (serverless) β β ββ Use: Load data to S3, Redshift, Elasticsearch, Splunk β β β β π¬ Kinesis Data Analytics β β ββ SQL queries on streaming data β β ββ Real-time dashboards β β ββ Use: Real-time metrics, anomaly detection β β β β πΉ Kinesis Video Streams β β ββ Capture, process, store video streams β β ββ Use: IoT, security cameras, ML on video β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ DATA FLOW COMPARISON: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Kinesis Data Streams: β β Producers β Data Stream (shards) β Consumers (custom apps) β β β (store 1-365 days) β β Lambda/KCL β β β β Kinesis Firehose: β β Producers β Firehose β S3/Redshift/Elasticsearch/Splunk β β β (optional Lambda transform) β β No storage, direct delivery β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Feature | Kinesis Data Streams | Kinesis Firehose |
---|---|---|
Purpose | Custom real-time processing | Load data to destinations |
Real-time | β Real-time (70-200 ms) | β οΈ Near real-time (60s+ buffer) |
Data Retention | β 1-365 days | β No retention |
Scaling | Manual (add/remove shards) | Auto-scaling (serverless) |
Consumers | Custom (Lambda, KCL, Analytics) | Fixed destinations (S3, Redshift, ES, Splunk) |
Replay | β Can replay data (retained) | β Cannot replay |
Complexity | More complex (manage shards) | Simpler (fully managed) |
Use Case | Custom analytics, ML, complex processing | Load to S3/Redshift for analysis |
IoT Devices β Kinesis Data Streams β Lambda (process) β DynamoDB β Kinesis Data Analytics (SQL) β QuickSight (Dashboard) Use Case: IoT sensor data, real-time monitoring, live dashboards Exam Keywords: "real-time analytics", "streaming data", "dashboard"
Application Logs β Kinesis Firehose β S3 (Data Lake) β β (optional Lambda) Athena (query) β QuickSight (reports) Use Case: Log aggregation, batch analytics, data warehousing Exam Keywords: "load to S3", "data lake", "batch analysis"
Data Sources β Kinesis Firehose β Lambda (transform) β Redshift β S3 backup Use Case: Data warehouse, business intelligence Exam Keywords: "transform and load", "Redshift", "ETL"
Trap | Wrong Answer | Correct Answer |
---|---|---|
Load streaming data to S3 | Data Streams + custom Lambda | Firehose (simplest, fully managed) |
Real-time processing < 1 second | Firehose | Data Streams (Firehose has 60s+ buffer) |
Replay data from yesterday | Firehose | Data Streams (Firehose has no retention) |
Custom ML processing on stream | Firehose | Data Streams + Lambda/KCL |
SQL query on streaming data | Athena | Kinesis Data Analytics (Athena is for S3) |
High throughput, need to scale | Firehose (auto-scales) | Both work; Firehose simpler if loading to destination |
Streaming data scenario: ββ Need to load to S3/Redshift/ES? β Firehose ββ Need real-time < 1s? β Data Streams ββ Need to replay data? β Data Streams ββ SQL queries on stream? β Data Analytics ββ Custom ML/processing? β Data Streams + Lambda
Service | EFS (Elastic File System) | FSx for Windows | FSx for Lustre |
---|---|---|---|
Protocol | NFS (Linux) | SMB (Windows) | Lustre (High-Performance Computing) |
Use Case | Shared file storage for Linux-based EC2 instances, Lambda. | Shared file storage for Windows-based applications. | High-performance computing, machine learning, big data. |
Scenario | Use S3 (Object) | Use EBS (Block) | Use EFS (File) |
---|---|---|---|
Website static assets (images, videos) | β Perfect fit | β Wrong use case | β Overkill |
Boot volume for an EC2 instance | β Cannot be a boot volume | β Required | β Cannot be a boot volume |
Shared file system for many Linux EC2s | β Not a file system | β Single instance only | β Perfect fit |
Store backups and archives | β Cost-effective | β οΈ Expensive | β οΈ Expensive |
Scenario | Use RDS (Relational) | Use DynamoDB (NoSQL) |
---|---|---|
Need complex queries, JOINs, transactions | β Full SQL support | β Limited query patterns |
Need extreme read/write scale with simple lookups | β οΈ Can be a bottleneck | β Scales massively |
Application has unpredictable traffic | β οΈ Manual scaling | β On-demand auto-scaling |
Schema is flexible or changes often | β Rigid schema | β Schemaless |
Start: What are your RTO/RPO requirements? (RTO: Recovery Time Objective, RPO: Recovery Point Objective) | +-- RTO: Hours, RPO: Hours (Low cost, tolerant of downtime) | `-- Backup and Restore | `-- Regularly back up data (S3, EBS Snapshots) and restore to a new region when needed. | +-- RTO: Tens of minutes, RPO: Minutes (Core services running) | `-- Pilot Light | `-- Replicate data to the DR region. Keep a minimal version of the environment (the "pilot light") running. | +-- RTO: Minutes, RPO: Seconds (Scaled-down, fully functional copy) | `-- Warm Standby | `-- A scaled-down but fully functional copy of your production environment is always running in the DR region. | `-- RTO: Seconds, RPO: Near-zero (Full production scale in both regions) `-- Multi-Site Active-Active `-- Traffic is served from both regions simultaneously. Use Route 53 for routing. Most expensive and complex.
Component | Solution |
---|---|
EC2 Instances | Use an Auto Scaling Group across multiple AZs. |
RDS Database | Enable Multi-AZ deployment. |
Load Balancer | ELB is highly available by default. |
Static Content | S3 is highly available by default. |
Component | Solution |
---|---|
EC2 Compute | Use Spot Instances for fault-tolerant loads; Reserved Instances/Savings Plans for steady loads. |
S3 Storage | Use Intelligent-Tiering or Lifecycle Policies to move data to cheaper classes. |
Database | Use Aurora Serverless for unpredictable workloads. |
Network Traffic | Use VPC Endpoints to avoid NAT Gateway data processing charges. |