When should you use EC2 vs Lambda?

Use Lambda for event-driven, short-running tasks (under 15 minutes), unpredictable traffic, and when you want zero server management. Use EC2 for long-running processes, consistent high traffic, specific OS/runtime requirements, or when you need more control over the environment.

How does IAM role assumption work for cross-account access?

Account A creates a role with a trust policy allowing Account B to assume it. Account B's user/role calls sts:AssumeRole to get temporary credentials. These credentials have permissions defined in the role's permission policy. This enables secure cross-account access without sharing long-term credentials.

50+ AWS Interview Questions 2025: EC2, S3, Lambda, VPC & IAM

Q: What is the difference between a Region and an Availability Zone in AWS?

A Region is a geographic area (e.g., us-east-1) containing multiple isolated data centers. An Availability Zone (AZ) is one or more discrete data centers within a Region with independent power and networking. Deploying across multiple AZs provides high availability within a Region.

Q: What is the difference between Security Groups and NACLs?

Security Groups are stateful (return traffic automatically allowed), operate at instance level, and only support allow rules. NACLs are stateless (must explicitly allow return traffic), operate at subnet level, and support both allow and deny rules. Use Security Groups for instance-level control, NACLs for subnet-level defense.

Q: What are S3 storage classes and when should you use each?

S3 Standard for frequently accessed data, S3 Intelligent-Tiering for unknown access patterns, S3 Standard-IA for infrequent access (30+ days), S3 Glacier for archival (minutes to hours retrieval), S3 Glacier Deep Archive for long-term archival (12+ hours retrieval). Use lifecycle policies to automatically transition objects.

Q: What is the difference between RDS Multi-AZ and Read Replicas?

Multi-AZ is for high availability - a synchronous standby in another AZ that automatically fails over. Read Replicas are for performance - asynchronous copies for read scaling. Multi-AZ has no performance benefit (standby isn't used for reads). You can have Read Replicas with Multi-AZ enabled for both HA and read scaling.

AWS has over 200 services, but interviews focus on maybe 15 of them. The difference between candidates who pass AWS interviews and those who don't isn't breadth of knowledge—it's depth in the services that matter.

You can know what every service does and still fail if you can't explain how VPC networking actually works or when to use Lambda versus EC2. This guide covers the core AWS services that appear in nearly every cloud interview, with the questions interviewers actually ask.

AWS Fundamentals Questions
EC2 and Compute Questions
Lambda and Serverless Questions
S3 Storage Questions
EBS and Block Storage Questions
VPC Networking Questions
IAM Security Questions
RDS Database Questions
DynamoDB Questions
Architecture and Troubleshooting Questions

AWS Fundamentals Questions

Understanding AWS's global infrastructure is essential before diving into specific services.

What is the difference between a Region and an Availability Zone?

A Region is a geographic area containing multiple data centers, identified by names like us-east-1 (Virginia) or eu-west-1 (Ireland). Each region is completely independent, meaning services and data in one region don't automatically replicate to another.

An Availability Zone (AZ) is one or more discrete data centers within a region, each with independent power, cooling, and networking. AZs within a region connect via low-latency links, enabling you to build highly available applications that survive data center failures.

flowchart TB
    subgraph aws["AWS Global Infrastructure"]
        subgraph r1["Region (us-east-1)"]
            az1a["AZ (us-east-1a)"]
            az1b["AZ (us-east-1b)"]
            az1c["AZ (us-east-1c)"]
        end
        subgraph r2["Region (eu-west-1)"]
            az2a["AZ (eu-west-1a)"]
            az2b["..."]
        end
        edge["Edge Locations (200+)"]
    end

Edge Locations are CDN endpoints for CloudFront and Route 53, positioned closer to end users than regions.

Why should you deploy across multiple Availability Zones?

Deploying across multiple AZs provides high availability for your applications. If one AZ experiences a failure—whether from power outage, network issues, or hardware problems—your application continues running in other AZs without interruption.

Single-AZ deployments create a single point of failure. Even if your application code is perfect, an AZ-level event will take it offline. Multi-AZ deployment is the foundation of building resilient systems on AWS.

What is the scope of different AWS services?

Services operate at different scopes, which affects how you architect solutions and what you need to create in each region you use. Understanding this prevents common mistakes like assuming S3 data replicates automatically across regions.

Scope	Examples	Implication
Global	IAM, Route 53, CloudFront	Data replicated across all regions
Regional	S3, Lambda, VPC	Must be created in each region you use
AZ-scoped	EC2, EBS, Subnets	Tied to specific AZ, not automatically replicated

Common interview trap: "Is S3 regional or global?" S3 buckets are regional (data stays in the region), but bucket names are globally unique. The S3 console shows all buckets globally, which confuses people.

What are the five pillars of the AWS Well-Architected Framework?

The Well-Architected Framework provides a consistent approach for evaluating architectures against AWS best practices. Interviewers expect you to know these pillars and frame your architecture answers around them.

Pillar	Focus
Operational Excellence	Run and monitor systems, continuous improvement
Security	Protect information, systems, and assets
Reliability	Recover from failures, meet demand
Performance Efficiency	Use resources efficiently
Cost Optimization	Avoid unnecessary costs

When answering architecture questions, reference these pillars to demonstrate mature thinking about trade-offs.

EC2 and Compute Questions

EC2 (Elastic Compute Cloud) provides virtual servers and remains the foundation of most AWS architectures.

How do EC2 instance types work and how do you choose the right one?

EC2 instance types are named by family, generation, and size (e.g., m5.xlarge). The family letter indicates the use case, the number indicates generation, and the size determines CPU and memory allocation.

Choosing the right instance type requires matching your workload characteristics to the instance family's strengths. Running a memory-intensive workload on a compute-optimized instance wastes money and underperforms.

Family	Use Case	Examples
M (General)	Balanced compute, memory, networking	Web servers, small databases
C (Compute)	CPU-intensive workloads	Batch processing, gaming servers
R (Memory)	Memory-intensive workloads	In-memory databases, caching
T (Burstable)	Variable workloads with burst capability	Dev environments, small apps
G/P (GPU)	Graphics, machine learning	ML training, video encoding

When would you use a T instance versus an M instance?

T instances are burstable, meaning they accumulate CPU credits when idle and spend them during traffic spikes. This makes them cost-effective for workloads with variable CPU usage patterns, like development environments or applications with occasional traffic spikes.

M instances provide consistent performance without the credit system. They're better for production workloads with steady, predictable CPU requirements. Using T instances for constant high CPU usage exhausts credits and throttles performance.

What are the EC2 pricing models and when do you use each?

Understanding EC2 pricing directly impacts cloud costs. Many organizations overspend by using On-Demand for everything when Reserved or Spot instances would work better.

Model	Description	Best For
On-Demand	Pay per hour/second, no commitment	Short-term, unpredictable workloads
Reserved	1-3 year commitment, up to 72% discount	Steady-state, predictable workloads
Spot	Bid on unused capacity, up to 90% discount	Fault-tolerant, flexible workloads
Savings Plans	Commit to $/hour usage, flexible across instance types	Similar to Reserved but more flexible

How would you reduce costs for a batch processing job that can tolerate interruptions?

Spot Instances are ideal for fault-tolerant batch processing, offering up to 90% cost savings over On-Demand pricing. The trade-off is that AWS can reclaim Spot instances with a 2-minute warning when capacity is needed.

To use Spot effectively for batch jobs, implement checkpointing to save progress regularly so you can resume if interrupted. Use Spot Fleet to maintain capacity across multiple instance types and AZs, increasing the chance of getting instances even during high-demand periods.

Lambda and Serverless Questions

Lambda runs code without provisioning servers and is central to serverless architectures on AWS.

What is AWS Lambda and what are its key characteristics?

Lambda is a serverless compute service that runs your code in response to events without requiring you to manage servers. You upload your code, configure triggers, and Lambda handles everything else—provisioning, scaling, patching, and high availability.

Key characteristics define when Lambda is appropriate:

Event-driven: Triggered by events from API Gateway, S3, SQS, DynamoDB, and dozens of other sources
Pay per invocation: Charged only for requests and compute time (GB-seconds)
Auto-scaling: Scales automatically from zero to thousands of concurrent executions
Time limit: Maximum 15 minutes per invocation

What are Lambda cold starts and how do you reduce them?

A cold start occurs when Lambda must initialize a new execution environment for your function. This happens on the first invocation after deployment, after periods of inactivity, or when scaling up to handle more concurrent requests. Cold starts add latency—sometimes hundreds of milliseconds or more.

flowchart LR
    subgraph cold["Cold Start"]
        C1["Request"] --> C2["Initialize Runtime"] --> C3["Load Code"] --> C4["Run Handler"] --> C5["Response"]
    end
    subgraph warm["Warm Start"]
        W1["Request"] --> W2["Run Handler"] --> W3["Response"]
    end

Strategies to reduce cold starts:

Provisioned Concurrency: Keep a specified number of instances warm and ready
Smaller deployment packages: Less code to load means faster initialization
Faster runtimes: Python and Node.js initialize faster than Java or .NET
Keep functions warm: Schedule periodic invocations (though Provisioned Concurrency is better)

When should you use Lambda versus EC2?

This decision depends on workload characteristics, cost considerations, and operational preferences. Neither is universally better—they serve different needs.

flowchart TD
    Q1{"Need runtime<br/>> 15 minutes?"}
    Q1 -->|Yes| A1["EC2 or ECS/EKS"]
    Q1 -->|No| Q2{"Predictable,<br/>constant traffic?"}
    Q2 -->|Yes| A2["EC2<br/>(often cheaper at scale)"]
    Q2 -->|No| Q3{"Need specific<br/>OS or runtime?"}
    Q3 -->|Yes| A3["EC2"]
    Q3 -->|No| A4["Lambda<br/>(simplicity wins)"]

When would Lambda be more expensive than EC2?

Lambda becomes more expensive than EC2 at high, constant utilization. Lambda charges per invocation and per GB-second of compute time. If you're running 24/7 at full capacity, a Reserved EC2 instance is typically much cheaper.

Lambda wins economically for variable, spiky, or low-utilization workloads. The ability to scale to zero during idle periods and pay nothing makes Lambda cost-effective for many use cases, even if the per-compute-second cost is higher.

S3 Storage Questions

S3 (Simple Storage Service) is object storage that scales infinitely and appears in virtually every AWS architecture.

What is S3 and what are its core concepts?

S3 stores objects (files up to 5TB) in buckets (containers with globally unique names). Unlike file systems with directories, S3 uses a flat namespace where object keys include the full path (e.g., photos/2026/vacation.jpg).

This design enables S3 to scale infinitely—there's no practical limit to how much data you can store. S3 replicates objects across multiple AZs automatically, providing 99.999999999% (11 nines) durability.

Key concepts:

Buckets: Containers for objects (globally unique names, regional storage)
Objects: Files plus metadata, identified by keys
Keys: The full path to an object within a bucket

What are S3 storage classes and when should you use each?

S3 offers multiple storage classes optimized for different access patterns and cost requirements. Choosing the right class significantly impacts storage costs—Glacier Deep Archive costs about 1/23rd of Standard storage.

Class	Access Pattern	Retrieval	Cost
Standard	Frequent access	Immediate	Highest
Intelligent-Tiering	Unknown pattern	Immediate	Auto-optimized
Standard-IA	Infrequent (30+ days)	Immediate	Lower + retrieval fee
One Zone-IA	Infrequent, non-critical	Immediate	Lower, single AZ
Glacier Instant	Archive, rare access	Milliseconds	Low + retrieval fee
Glacier Flexible	Archive	Minutes to hours	Lower
Glacier Deep Archive	Long-term archive	12+ hours	Lowest

How do S3 lifecycle policies work?

Lifecycle policies automate the transition of objects between storage classes or delete them after a specified period. This enables cost optimization without manual intervention—objects automatically move to cheaper storage as they age.

{
  "Rules": [{
    "Status": "Enabled",
    "Transitions": [
      {"Days": 30, "StorageClass": "STANDARD_IA"},
      {"Days": 90, "StorageClass": "GLACIER"}
    ],
    "Expiration": {"Days": 365}
  }]
}

This policy transitions objects to Standard-IA after 30 days, Glacier after 90 days, and deletes them after one year.

How would you ensure no S3 bucket in your account is ever publicly accessible?

Enable S3 Block Public Access at the account level. This setting overrides any bucket policy or ACL that would grant public access, preventing accidental data exposure even if someone misconfigures a bucket.

Additionally, use AWS Config rules to detect and alert on S3 misconfigurations. Config can automatically remediate issues or notify your security team. Defense in depth means multiple controls protecting the same asset.

What are the different S3 encryption options?

S3 supports multiple encryption methods for data at rest. The choice depends on your compliance requirements and key management preferences.

SSE-S3: AWS manages encryption keys entirely—simplest option
SSE-KMS: You control keys in AWS Key Management Service, providing audit trails and key rotation
SSE-C: You provide encryption keys with each request—AWS never stores them
Client-side: Encrypt data before uploading—AWS never sees unencrypted data

For most use cases, SSE-S3 or SSE-KMS provides sufficient security with minimal operational overhead.

EBS and Block Storage Questions

EBS (Elastic Block Store) provides block storage volumes that attach to EC2 instances.

What is EBS and how does it differ from S3?

EBS provides block storage volumes that function like hard drives attached to EC2 instances. Unlike S3's object storage, EBS supports file systems and databases that require block-level operations.

Key differences from S3:

Feature	S3	EBS
Type	Object storage	Block storage
Access	HTTP API	Attach to EC2
Sharing	Any number of clients	One EC2 (usually)
Scope	Regional	Single AZ
Use Case	Static files, backups, data lakes	Boot volumes, databases

What EBS volume types are available and when do you use each?

EBS offers SSD and HDD volume types optimized for different performance characteristics. Choosing the wrong type either wastes money (over-provisioning) or causes performance problems (under-provisioning).

Type	Use Case	IOPS	Throughput
gp3	General purpose SSD	Up to 16,000	Up to 1,000 MB/s
gp2	General purpose SSD (legacy)	Burst to 3,000	125-250 MB/s
io2	High-performance SSD	Up to 64,000	Up to 1,000 MB/s
st1	Throughput HDD	N/A	Up to 500 MB/s
sc1	Cold HDD	N/A	Up to 250 MB/s

Which EBS volume type would you use for a database needing consistent high IOPS?

io2 (or io2 Block Express for extreme performance) is the right choice for databases requiring consistent high IOPS. While gp3 provides up to 16,000 IOPS, io2 delivers up to 64,000 IOPS with provisioned performance guarantees.

For high-transaction OLTP databases, the io2's provisioned IOPS ensures consistent latency under load. The higher cost is justified when database performance directly impacts business operations.

How do EBS snapshots work?

EBS snapshots are point-in-time backups stored in S3 (managed by AWS, not visible in your buckets). Snapshots are incremental—only blocks changed since the last snapshot are stored—making them space and cost efficient.

Snapshots are regional but can be copied cross-region for disaster recovery. You can create new volumes from snapshots in any AZ within the region, enabling data migration between AZs.

VPC Networking Questions

VPC (Virtual Private Cloud) is the foundation of AWS networking. Every EC2 instance, Lambda function, and RDS database runs inside a VPC.

What is a VPC and what are its core components?

A VPC is your isolated network in AWS where you define IP ranges, create subnets, configure routing, and control security. Think of it as your own data center network in the cloud, but with AWS managing the physical infrastructure.

Core components work together to create network topology:

VPC: The overall network boundary with a CIDR block (e.g., 10.0.0.0/16)
Subnet: A segment within a VPC, existing in a single AZ
Route Table: Rules determining where network traffic goes
Internet Gateway (IGW): Enables communication with the internet
NAT Gateway: Allows private resources to reach the internet without being reachable from it

flowchart TB
    subgraph vpc["VPC (10.0.0.0/16)"]
        subgraph aza["AZ-a"]
            pub1["Public Subnet<br/>10.0.1.0/24"]
            priv1["Private Subnet<br/>10.0.3.0/24"]
            nat["NAT Gateway"]
        end
        subgraph azb["AZ-b"]
            pub2["Public Subnet<br/>10.0.2.0/24"]
            priv2["Private Subnet<br/>10.0.4.0/24"]
        end
        igw["Internet Gateway"]
    end
 
    pub1 -->|"0.0.0.0/0"| igw
    pub2 -->|"0.0.0.0/0"| igw
    priv1 -->|"0.0.0.0/0"| nat
    priv2 -->|"0.0.0.0/0"| nat
    nat --> igw

What is the difference between public and private subnets?

The distinction between public and private subnets determines what can be reached from the internet and forms the basis of network security architecture.

Characteristic	Public Subnet	Private Subnet
Route to IGW	Yes	No
Public IP	Can have	No
Reachable from internet	Yes	No
Can reach internet	Yes	Via NAT Gateway
Typical use	Load balancers, bastion hosts	Application servers, databases

A subnet becomes "public" by having a route table entry pointing 0.0.0.0/0 to an Internet Gateway.

Why should you put your database in a private subnet?

Placing databases in private subnets provides defense in depth. Private subnets have no route to the Internet Gateway, so even if security groups are misconfigured, the database isn't directly reachable from the internet.

This architecture requires attackers to first compromise a resource in the public subnet before they can even attempt to reach the database. Combined with proper security groups, this significantly reduces attack surface.

What is the difference between Security Groups and NACLs?

Security Groups and Network ACLs (NACLs) both control traffic but operate at different levels and with different behaviors. Understanding both is essential for troubleshooting connectivity issues.

Feature	Security Group	NACL
Scope	Instance level	Subnet level
Rules	Allow only	Allow and Deny
Statefulness	Stateful	Stateless
Evaluation	All rules evaluated	Rules evaluated in order
Default	Deny all inbound, allow all outbound	Allow all

Stateful vs Stateless is the key difference: Security Groups automatically allow return traffic for allowed connections. NACLs require explicit rules for both directions.

Traffic is blocked even though the Security Group allows it. What could cause this?

When traffic is blocked despite correct Security Group rules, investigate these possibilities in order:

NACL blocking: NACLs evaluate before Security Groups at the subnet level
Route table issues: Traffic might not be routed correctly to reach the destination
Ephemeral ports: NACLs need rules for ephemeral port ranges (1024-65535) for return traffic
Source IP: NAT changes source IP—the Security Group might expect a different source
Missing IGW/NAT: Infrastructure might not exist or be attached

What are the options for connecting VPCs together?

AWS provides several connectivity options depending on scale, latency, and security requirements.

Method	Use Case
VPC Peering	Connect two VPCs (same or different accounts/regions)
Transit Gateway	Hub-and-spoke for multiple VPCs
VPN	Encrypted connection to on-premises over internet
Direct Connect	Dedicated private connection to on-premises
PrivateLink	Access AWS services or your services without internet

IAM Security Questions

IAM (Identity and Access Management) controls who can do what in your AWS account. Security questions appear in every AWS interview.

What are the core IAM concepts?

IAM provides authentication (who you are) and authorization (what you can do) for AWS. Understanding these concepts is fundamental to AWS security.

Users: Individual identities with long-term credentials (password, access keys)
Groups: Collections of users—attach policies to groups, not individual users
Roles: Identities assumed by services, applications, or users—provide temporary credentials
Policies: JSON documents defining permissions, attached to users, groups, or roles

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:PutObject"
    ],
    "Resource": "arn:aws:s3:::my-bucket/*"
  }]
}

What are the different IAM policy types?

IAM policies come in several types that work together to determine effective permissions. Understanding policy types helps troubleshoot access issues and design proper security boundaries.

Type	Description
Identity-based	Attached to users, groups, roles
Resource-based	Attached to resources (S3 bucket policy, SQS policy)
Permission boundaries	Maximum permissions an identity can have
Service control policies	Organization-level limits (AWS Organizations)

What is the principle of least privilege and why does it matter?

Least privilege means granting only the permissions needed to perform a task—no more. This fundamental security principle limits the blast radius when credentials are compromised or mistakes are made.

Bad practice:

{"Action": "*", "Resource": "*"}

Good practice:

{"Action": "s3:GetObject", "Resource": "arn:aws:s3:::specific-bucket/*"}

Always mention least privilege when discussing IAM in interviews—it demonstrates security awareness.

How should an EC2 instance access S3 or other AWS services?

EC2 instances should use IAM roles attached via instance profiles, never hardcoded access keys. The AWS SDK automatically retrieves temporary credentials from the instance metadata service, which rotate automatically.

# No credentials needed - uses instance role
import boto3
s3 = boto3.client('s3')
s3.list_buckets()

Storing access keys on instances creates security risks: they don't rotate automatically, can be accidentally committed to version control, and persist if the instance is compromised.

How does cross-account IAM role assumption work?

Cross-account access enables resources in one AWS account to access resources in another without sharing long-term credentials. This pattern is common in multi-account architectures.

The process works as follows:

Account B creates a role with a trust policy allowing Account A to assume it
Account A calls sts:AssumeRole to get temporary credentials
Account A uses those credentials to access Account B's resources

// Trust policy in Account B's role
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::ACCOUNT_A_ID:root"},
    "Action": "sts:AssumeRole"
  }]
}

RDS Database Questions

RDS (Relational Database Service) manages relational databases including MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Aurora.

What does RDS manage for you versus what you still manage?

RDS handles operational database tasks that traditionally consume significant DBA time, letting you focus on schema design and application development.

RDS manages:

Provisioning and patching
Backups (automated, point-in-time recovery)
High availability (Multi-AZ)
Read scaling (Read Replicas)
Monitoring and metrics

You still manage:

Schema design and optimization
Query performance tuning
Application-level concerns

What is the difference between RDS Multi-AZ and Read Replicas?

Multi-AZ and Read Replicas serve different purposes and can be used together. Confusing them is a common interview mistake.

Feature	Multi-AZ	Read Replica
Purpose	High availability	Read scaling
Replication	Synchronous	Asynchronous
Failover	Automatic	Manual promotion
Read traffic	No (standby not accessible)	Yes
Cross-region	No	Yes

Multi-AZ provides automatic failover if the primary fails. Read Replicas distribute read traffic but don't provide automatic failover.

Your database needs both high availability and read scaling. What do you configure?

Enable both features—they're independent and complementary. Configure Multi-AZ for high availability with automatic failover to a synchronous standby. Create Read Replicas for read scaling to distribute read traffic across multiple database copies.

Multi-AZ ensures your database survives AZ failures. Read Replicas ensure read-heavy workloads don't overwhelm the primary. Most production databases benefit from both.

When would you choose Aurora over standard RDS MySQL?

Aurora is AWS's cloud-native relational database, compatible with MySQL and PostgreSQL but built differently under the hood. Choose Aurora when you need enhanced reliability and performance.

Aurora advantages:

Storage auto-scales up to 128 TB
6 copies of data across 3 AZs (vs 2 copies for standard RDS Multi-AZ)
Up to 15 Read Replicas (vs 5 for standard RDS)
Faster failover (typically under 30 seconds)
Aurora Serverless for variable workloads

Aurora costs more but provides enterprise-grade reliability. For smaller workloads where cost matters more than maximum availability, standard RDS is appropriate.

DynamoDB Questions

DynamoDB is AWS's fully managed NoSQL database, offering single-digit millisecond latency at any scale.

What is DynamoDB and what are its key characteristics?

DynamoDB is a key-value and document database that provides consistent performance regardless of data size. It's fully managed—no servers to provision, patch, or manage—and scales automatically.

Key characteristics:

Single-digit millisecond latency at any scale
Automatic scaling (on-demand or provisioned capacity)
Built-in replication across AZs
No servers to manage

flowchart TB
    subgraph table["Table: Orders"]
        pk["Partition Key: customer_id"]
        sk["Sort Key: order_date"]
        subgraph items["Items"]
            i1["{customer_id: 123,<br/>order_date: 2026-01-07,<br/>total: 99.99, ...}"]
            i2["{customer_id: 123,<br/>order_date: 2026-01-06,<br/>items: [...], ...}"]
        end
    end

How do DynamoDB partition keys and sort keys work?

The primary key design determines how DynamoDB stores and retrieves data. Good key design enables efficient queries; poor design creates hot partitions and limits query flexibility.

Partition Key: Determines which partition stores the item. DynamoDB hashes this key to distribute data across partitions. Must be unique (if no sort key) or unique in combination with sort key.

Sort Key: Orders items within a partition, enabling range queries. Items with the same partition key are stored together, sorted by sort key.

What are DynamoDB indexes and when do you use them?

Indexes enable queries on attributes other than the primary key. Without indexes, you can only query by partition key (and optionally sort key).

GSI (Global Secondary Index): Creates a new partition key (and optional sort key) from table attributes. Eventually consistent. Can be created anytime.

LSI (Local Secondary Index): Same partition key as the table, different sort key. Supports strongly consistent reads. Must be created at table creation time.

How would you design a DynamoDB table to query orders by customer and by status?

Design the primary key for the most common access pattern, then add indexes for secondary patterns.

Primary key: Partition key = customer_id, sort key = order_date. This efficiently supports "get all orders for customer X, sorted by date."

GSI: Partition key = order_status to support "get all orders with status 'pending'."

This design enables both access patterns with single queries rather than table scans.

When should you choose DynamoDB versus RDS?

The choice depends on data model, query patterns, and scale requirements. Neither is universally better.

Factor	Choose RDS	Choose DynamoDB
Data model	Complex relationships, joins	Simple access patterns
Query patterns	Ad-hoc, complex queries	Known, limited patterns
Scale	Vertical (bigger instances)	Horizontal (unlimited)
Consistency	Strong (ACID)	Eventually consistent (default)
Schema	Fixed schema	Flexible schema

Architecture and Troubleshooting Questions

These questions test your ability to apply AWS knowledge to real scenarios.

How would you design a highly available web application on AWS?

A highly available architecture eliminates single points of failure and survives component failures without downtime. The standard pattern uses multiple AZs with load balancing.

Architecture components:

Multi-AZ deployment across at least 2 AZs
Application Load Balancer distributing traffic
Auto Scaling Group for EC2 instances
RDS Multi-AZ for database
S3 for static assets, CloudFront for CDN
Private subnets for app/database, public for ALB
Security Groups limiting access between tiers

How would you reduce costs for a development environment?

Development environments don't need production-level availability, creating cost optimization opportunities. Balance cost savings against developer productivity.

Cost reduction strategies:

Use smaller instance types (t3.micro, t3.small)
Schedule instances to stop outside business hours (Lambda + EventBridge)
Use Spot Instances for non-critical workloads
Single-AZ RDS (availability less critical in dev)
Delete unused EBS volumes and snapshots
Review and right-size based on CloudWatch metrics

How would you design a system to process files uploaded to S3?

Event-driven architectures using S3 notifications and Lambda provide scalable file processing without managing servers.

Design:

S3 bucket with event notification on object creation
S3 triggers Lambda for processing
For large files, Lambda sends message to SQS, EC2/ECS workers process
Results stored in S3 or database
Dead-letter queue for failed processing
CloudWatch alarms for monitoring failures

An EC2 instance can't reach the internet. What do you check?

Systematic troubleshooting starts with identifying where traffic is blocked. Work through the network path from instance to internet.

Troubleshooting steps:

Is it in a public or private subnet?
Public subnet: Does it have a public IP? Is there a route to IGW?
Private subnet: Is there a NAT Gateway? Route to NAT?
Security Group: Outbound rules allow the traffic?
NACL: Allow outbound and inbound for return traffic?
Is the IGW/NAT Gateway actually created and attached?

A Lambda function times out when accessing RDS. Why?

Lambda timeout issues with RDS typically stem from networking configuration or connection management problems.

Common causes:

Lambda in VPC needs NAT Gateway for internet (unless using VPC endpoints)
Security Group on RDS must allow traffic from Lambda's security group
Lambda must be in VPC and subnets that can reach RDS
Cold start plus connection establishment may exceed timeout
Connection pooling issues—Lambda doesn't reuse connections well

S3 bucket policy allows access but requests are denied. Why?

When explicit allow policies don't grant access, something else is denying. Work through all policy types that could contain denials.

Check these in order:

S3 Block Public Access enabled at bucket or account level?
IAM policy on the user/role explicitly denying?
Permission boundary restricting access?
VPC endpoint policy restricting access?
Bucket policy condition not met (IP, VPC, MFA)?

Quick Reference

Topic	Key Points
EC2	Virtual servers, instance families for different workloads, pricing models
Lambda	Serverless, event-driven, 15-minute max, cold starts
S3	Object storage, storage classes, lifecycle policies, encryption
EBS	Block storage for EC2, volume types, snapshots
VPC	Subnets, route tables, Security Groups vs NACLs
IAM	Users, groups, roles, policies, least privilege
RDS	Managed relational, Multi-AZ for HA, Read Replicas for scaling
DynamoDB	NoSQL, partition/sort keys, GSI/LSI

Complete DevOps Engineer Interview Guide - Full DevOps interview preparation
Docker Interview Guide - Container fundamentals
Kubernetes Interview Guide - Container orchestration on EKS
Linux Commands Interview Guide - Essential Linux skills
Monitoring & Observability Interview Guide - CloudWatch and beyond

Table of Contents