Terraform has become the standard for Infrastructure as Code. Whether you're managing AWS, Azure, GCP, or Kubernetes, interviewers expect you to understand how Terraform works—not just how to copy examples from the docs.
This guide covers what actually comes up in DevOps and cloud engineering interviews: state management, modules, environments, and the patterns that separate junior from senior engineers.
Table of Contents
- Terraform Fundamentals Questions
- HCL Language Questions
- State Management Questions
- Module Questions
- Workspace and Environment Questions
- Best Practices Questions
- Troubleshooting and Scenario Questions
Terraform Fundamentals Questions
Understanding Terraform's core concepts is essential for any DevOps interview.
What is Infrastructure as Code and why does it matter?
Infrastructure as Code (IaC) means managing infrastructure through configuration files rather than manual processes. Instead of clicking through cloud consoles or running ad-hoc commands, you define your entire infrastructure in version-controlled files that can be reviewed, tested, and applied consistently.
This approach transforms infrastructure management by bringing software engineering practices to operations. Your infrastructure becomes reproducible—the same configuration always produces the same result. Changes are tracked in version control, enabling code reviews and easy rollbacks. Automation becomes straightforward since you can integrate infrastructure changes into CI/CD pipelines.
Key benefits:
- Version control: Track changes, review PRs, rollback
- Reproducibility: Same config = same infrastructure
- Automation: CI/CD for infrastructure
- Documentation: Code is the documentation
How does Terraform compare to other IaC tools?
Each IaC tool has its strengths and ideal use cases. Terraform's main advantage is its provider ecosystem that works across any cloud or service with an API. CloudFormation is AWS-native and tightly integrated but locks you into one cloud. Ansible is procedural rather than declarative, making it better suited for configuration management than infrastructure provisioning.
The choice often depends on your organization's needs. Multi-cloud or hybrid environments benefit from Terraform's consistency. AWS-only shops might prefer CloudFormation's native integration. Teams with strong programming backgrounds might choose Pulumi or CDK for their familiar language syntax.
| Tool | Type | Language | Best For |
|---|---|---|---|
| Terraform | Declarative | HCL | Multi-cloud, any provider |
| CloudFormation | Declarative | YAML/JSON | AWS-only shops |
| Pulumi | Declarative | Python/TS/Go | Developers who prefer real languages |
| Ansible | Procedural | YAML | Configuration management |
| CDK | Declarative | Python/TS | AWS with programming languages |
When would you choose Terraform over CloudFormation?
This common interview question tests your understanding of tool selection based on requirements. Terraform excels when you need to work across multiple cloud providers or manage non-cloud resources like GitHub repositories, Datadog monitors, or Kubernetes clusters. Its provider ecosystem covers virtually any API-driven service.
Choose Terraform when you need multi-cloud support, consistent tooling across providers, or management of non-AWS resources. Choose CloudFormation when you're AWS-only, need tight AWS integration like StackSets and native drift detection, or when your organization has standardized on it.
What is the core Terraform workflow?
Terraform follows a simple but powerful workflow: initialize, plan, apply. Understanding this workflow and what happens at each stage demonstrates operational competency to interviewers.
The init phase downloads provider plugins and modules, sets up the backend, and prepares the working directory. Plan compares your configuration to the current state and shows what changes would be made without actually making them. Apply executes those changes, updating infrastructure to match your configuration.
# 1. Initialize - download providers, set up backend
terraform init
# 2. Plan - preview changes without applying
terraform plan
# 3. Apply - create/update infrastructure
terraform apply
# 4. Destroy - tear down infrastructure
terraform destroyWhat happens during init:
- Downloads provider plugins
- Initializes backend (local or remote)
- Downloads modules
- Creates
.terraformdirectory
How do providers and resources work in Terraform?
Providers are plugins that know how to interact with specific APIs—AWS, Azure, Kubernetes, or any service with an API. Resources are the actual infrastructure components you want to manage, defined using the provider's resource types.
Each resource has a type (combining provider and resource kind) and a local name you use to reference it elsewhere in your configuration. The block contents specify the configuration arguments for that resource. Understanding this anatomy helps you read and write Terraform configurations fluently.
# Configure the AWS provider
provider "aws" {
region = "us-east-1"
}
# Create a resource
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
}
}Resource anatomy:
aws_instance- resource type (provider_resource)"web"- local name (for referencing)- Block contents - configuration arguments
Referencing resources:
# Reference another resource's attribute
resource "aws_eip" "web_ip" {
instance = aws_instance.web.id # type.name.attribute
}HCL Language Questions
HCL (HashiCorp Configuration Language) is Terraform's domain-specific language for defining infrastructure.
How do you use variables in Terraform?
Variables make your Terraform configurations reusable and flexible. Input variables act as parameters—you define them in your configuration and provide values at runtime. Output variables export values for use by other configurations or for human consumption. Local variables are computed values for reuse within a module.
Understanding the different variable types and how to set them is fundamental Terraform knowledge. Variables can have defaults, validation rules, and type constraints that catch errors early.
Input variables: Parameters for your configuration
# variables.tf
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
variable "environment" {
description = "Environment name"
type = string
# No default = required variable
}
variable "allowed_ports" {
description = "List of allowed ports"
type = list(number)
default = [80, 443]
}
variable "tags" {
description = "Resource tags"
type = map(string)
default = {}
}Setting variables:
# Command line
terraform apply -var="environment=prod"
# Variable file
terraform apply -var-file="prod.tfvars"
# Environment variable
export TF_VAR_environment=prod
# Auto-loaded files: terraform.tfvars, *.auto.tfvarsOutput variables: Export values for other configs or users
# outputs.tf
output "instance_ip" {
description = "Public IP of the instance"
value = aws_instance.web.public_ip
}
output "database_password" {
description = "Database password"
value = random_password.db.result
sensitive = true # Won't show in logs
}Local variables: Computed values for reuse within a module
locals {
common_tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
name_prefix = "${var.project_name}-${var.environment}"
}
resource "aws_instance" "web" {
# ...
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-web"
})
}What data types does Terraform support?
Terraform supports primitive types (string, number, bool) and collection types (list, set, map) as well as structural types (object, tuple). Understanding these types helps you write type-safe configurations and catch errors during planning rather than at apply time.
Lists maintain order and allow duplicates. Sets are unordered and unique. Maps store key-value pairs. Objects combine named attributes with different types, while tuples are ordered collections with mixed types.
# Primitives
string = "hello"
number = 42
bool = true
# Collections
list = ["a", "b", "c"] # Ordered, same type
set = toset(["a", "b", "c"]) # Unordered, unique
map = { key = "value" } # Key-value pairs
# Structural
object({
name = string
age = number
})
tuple([string, number, bool])How do you write conditional expressions in Terraform?
Conditional expressions let you make decisions in your configuration based on variable values or other conditions. Terraform uses the ternary syntax common in many programming languages: condition ? true_value : false_value.
You can use conditionals for attribute values or combined with count to conditionally create entire resources. This pattern is essential for writing flexible modules that adapt to different environments or requirements.
# Ternary expression
resource "aws_instance" "web" {
instance_type = var.environment == "prod" ? "t3.large" : "t3.micro"
}
# Conditional resource creation
resource "aws_eip" "web" {
count = var.create_eip ? 1 : 0
instance = aws_instance.web.id
}What is the difference between count and for_each?
This is one of the most common Terraform interview questions because it reveals understanding of resource addressing and state management. Count creates resources indexed by number, while for_each creates resources indexed by key. This seemingly small difference has major implications for how Terraform handles changes.
Count works well for creating N identical resources, but causes problems when you remove items from the middle of a list—all subsequent indices shift, causing Terraform to destroy and recreate resources. For_each avoids this by keying resources by name, so removing one item only affects that specific resource.
count: Create multiple resources by index
resource "aws_instance" "web" {
count = 3
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-${count.index}" # web-0, web-1, web-2
}
}
# Reference: aws_instance.web[0], aws_instance.web[1]for_each: Create resources by key
variable "instances" {
default = {
web = "t3.micro"
api = "t3.small"
worker = "t3.medium"
}
}
resource "aws_instance" "server" {
for_each = var.instances
ami = "ami-0c55b159cbfafe1f0"
instance_type = each.value
tags = {
Name = each.key
}
}
# Reference: aws_instance.server["web"], aws_instance.server["api"]When to use which:
| Use Case | Recommendation |
|---|---|
| N identical resources | count |
| Resources with unique identity | for_each |
| Might remove items from middle | for_each |
| List of objects | for_each with toset() or tomap() |
The count index problem:
# With count = ["a", "b", "c"]
# Removing "b" causes "c" to shift from index 2 to 1
# Terraform sees: destroy old [2], modify [1]
# Result: Unintended recreation
# With for_each = toset(["a", "b", "c"])
# Removing "b" only affects resource["b"]
# Resources "a" and "c" unchangedHow do data sources work in Terraform?
Data sources let you query existing infrastructure or external information to use in your configuration. Unlike resources which create and manage infrastructure, data sources are read-only—they fetch information that already exists.
Common uses include looking up the latest AMI, getting the current AWS account ID, or reading information about existing infrastructure that Terraform doesn't manage. This is essential for integrating Terraform with manually-created resources or resources managed by other teams.
# Get latest Amazon Linux AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# Use it
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
# ...
}
# Get current AWS account ID
data "aws_caller_identity" "current" {}
output "account_id" {
value = data.aws_caller_identity.current.account_id
}State Management Questions
State management is arguably the most critical aspect of Terraform operations.
What is Terraform state and why is it important?
State is a JSON file that maintains the mapping between your configuration and real infrastructure. Without state, Terraform couldn't know which real resources correspond to which configuration blocks, what order to create or update resources, or what the current values of resource attributes are.
The state file contains resource IDs that let Terraform interact with the cloud provider API, dependency information for determining operation order, and cached attribute values that reduce API calls. Understanding state deeply is essential for troubleshooting and disaster recovery.
{
"resources": [
{
"type": "aws_instance",
"name": "web",
"instances": [
{
"attributes": {
"id": "i-1234567890abcdef0",
"ami": "ami-0c55b159cbfafe1f0",
"public_ip": "54.123.45.67"
}
}
]
}
]
}Why state matters:
- Maps config to real resource IDs
- Tracks dependencies for ordering
- Caches attributes to reduce API calls
- Detects drift from desired state
Why should you use remote state backends?
Local state files work for individual learning but cause serious problems in team environments. Without remote state, team members can't collaborate—they don't have access to each other's state files. Concurrent applies can corrupt state or cause duplicate resources. There's no locking to prevent conflicts and no backup if the local file is lost.
Remote backends solve all these problems by storing state in a shared location with locking to prevent concurrent modifications. They also provide encryption at rest, versioning for recovery, and access control.
Never use local state in teams. Remote backends provide:
- Shared access for team members
- State locking to prevent conflicts
- Encryption at rest
- Versioning for recovery
S3 Backend (AWS):
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/network/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks" # For locking
}
}GCS Backend (GCP):
terraform {
backend "gcs" {
bucket = "my-terraform-state"
prefix = "prod/network"
}
}Terraform Cloud:
terraform {
cloud {
organization = "my-org"
workspaces {
name = "prod-network"
}
}
}How does state locking work?
State locking prevents multiple users or processes from modifying state simultaneously, which could corrupt it or cause duplicate resources. When you run a Terraform operation that modifies state, Terraform first acquires a lock. If someone else has the lock, you wait until it's released.
For S3 backends, locking uses DynamoDB. Other backends have their own locking mechanisms. Understanding locking helps you troubleshoot situations where Terraform says the state is locked and you need to determine if another apply is running or if a lock is stuck.
sequenceDiagram
participant A as Developer A
participant S as State Backend
participant B as Developer B
A->>S: terraform apply
S-->>A: Lock acquired ✓
Note over A,S: Lock held by A
B->>S: terraform apply
S--xB: BLOCKED (lock held)
Note over B: Waiting for lock...
A->>S: Apply complete, release lock
S-->>A: Lock released
B->>S: Retry - acquire lock
S-->>B: Lock acquired ✓
Note over B,S: B proceeds with applyDynamoDB table for S3 backend locking:
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}What state commands should you know for interviews?
Terraform provides several commands for inspecting and manipulating state. These are essential for troubleshooting, refactoring, and disaster recovery. Interviewers often ask about specific commands and when you'd use them.
The most commonly used commands are state list and state show for inspection, state mv for refactoring, state rm for removing resources from management without destroying them, and import for bringing existing infrastructure under Terraform control.
# List resources in state
terraform state list
# Show specific resource
terraform state show aws_instance.web
# Move resource (rename)
terraform state mv aws_instance.web aws_instance.app
# Remove from state (doesn't destroy resource)
terraform state rm aws_instance.web
# Import existing resource into state
terraform import aws_instance.web i-1234567890abcdef0
# Force unlock (dangerous - use if lock is stuck)
terraform force-unlock LOCK_ID
# Pull remote state locally
terraform state pull > state.json
# Push local state to remote (dangerous)
terraform state push state.jsonHow do you import existing resources into Terraform?
Importing lets you bring manually-created resources under Terraform management without destroying and recreating them. This is essential when adopting Terraform for existing infrastructure or when resources were created outside your Terraform workflow.
The import process requires writing the resource configuration first, then running the import command with the resource address and real-world identifier. Finally, you run plan to verify your configuration matches the actual resource and adjust until there are no changes.
# 1. Write the resource configuration
resource "aws_instance" "existing" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
}
# 2. Import the existing resource
# terraform import aws_instance.existing i-1234567890abcdef0
# 3. Run plan to verify configuration matches
# terraform plan
# Adjust config until no changes shownModule Questions
Modules are the primary way to organize and reuse Terraform code.
Why should you use Terraform modules?
Modules package related resources together into reusable, shareable components. Instead of copying the same VPC configuration into every project, you define it once as a module and call it with different parameters. This reduces duplication, enforces consistency, and makes large configurations manageable.
Good modules hide implementation complexity behind a simple interface. Users don't need to understand all the resources involved—they just provide the required inputs and consume the outputs. This encapsulation also makes it easier to update implementations without affecting every project that uses the module.
Key benefits:
- Reusability: Write once, use many times
- Encapsulation: Hide complexity behind simple interface
- Consistency: Enforce standards across teams
- Versioning: Control updates and changes
How should you structure a Terraform module?
Module structure follows conventions that make modules predictable and easy to use. Every module needs at least main.tf for resources, variables.tf for inputs, and outputs.tf for values other configurations can use. Additional files like versions.tf for provider requirements and README.md for documentation are best practices.
Following these conventions means anyone familiar with Terraform can quickly understand your module's interface and implementation.
modules/
└── vpc/
├── main.tf # Resources
├── variables.tf # Input variables
├── outputs.tf # Output values
├── versions.tf # Provider requirements
└── README.md # Documentation
Example module:
# modules/vpc/variables.tf
variable "name" {
description = "VPC name"
type = string
}
variable "cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "azs" {
description = "Availability zones"
type = list(string)
}
# modules/vpc/main.tf
resource "aws_vpc" "this" {
cidr_block = var.cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = var.name
}
}
resource "aws_subnet" "public" {
count = length(var.azs)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.cidr, 8, count.index)
availability_zone = var.azs[count.index]
tags = {
Name = "${var.name}-public-${var.azs[count.index]}"
}
}
# modules/vpc/outputs.tf
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.this.id
}
output "public_subnet_ids" {
description = "Public subnet IDs"
value = aws_subnet.public[*].id
}Using the module:
module "vpc" {
source = "./modules/vpc"
name = "production"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
# Reference outputs
resource "aws_instance" "web" {
subnet_id = module.vpc.public_subnet_ids[0]
# ...
}What module sources does Terraform support?
Terraform can load modules from various sources, giving you flexibility in how you organize and share code. Local paths are simplest for development. The Terraform Registry provides public and private modules with versioning. Git repositories (GitHub, GitLab, Bitbucket) work for private modules with tag-based versioning.
Each source type has trade-offs. Local paths are convenient but don't version well. Registry modules have excellent versioning but require publishing. Git sources balance flexibility and versioning but require careful reference management.
# Local path
module "vpc" {
source = "./modules/vpc"
}
# Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
}
# GitHub
module "vpc" {
source = "github.com/org/repo//modules/vpc?ref=v1.0.0"
}
# S3 bucket
module "vpc" {
source = "s3::https://s3-eu-west-1.amazonaws.com/bucket/vpc.zip"
}How should you version modules in production?
Version pinning is essential for stable production infrastructure. Without it, module updates could unexpectedly change your infrastructure. Always specify exact versions or constrained ranges in production configurations.
Exact versions provide the most stability but require manual updates. Pessimistic constraints (~>) allow patch updates while preventing breaking changes. Test module updates in non-production environments before promoting version changes to production.
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0" # Exact version
# Or version constraints
# version = "~> 5.0" # >= 5.0.0, < 6.0.0
# version = ">= 5.0" # >= 5.0.0
}Workspace and Environment Questions
Managing multiple environments is a common challenge in Terraform.
How do Terraform workspaces work?
Workspaces let you maintain multiple state files for the same configuration. Each workspace has its own state, so you can deploy the same infrastructure to dev, staging, and production with environment-specific differences controlled by the workspace name.
Workspaces are convenient for environments that are structurally similar. They share the same backend configuration and use conditional logic based on terraform.workspace to vary settings. However, they provide less isolation than separate directory structures.
# List workspaces
terraform workspace list
# Create workspace
terraform workspace new staging
# Switch workspace
terraform workspace select production
# Show current
terraform workspace show
# Delete workspace
terraform workspace delete stagingUsing workspace in config:
resource "aws_instance" "web" {
instance_type = terraform.workspace == "prod" ? "t3.large" : "t3.micro"
tags = {
Environment = terraform.workspace
}
}When should you use directory structure instead of workspaces?
The directory approach creates stronger isolation between environments by giving each its own configuration files, backend configuration, and potentially different module versions. This isolation reduces the risk of accidentally applying to the wrong environment and makes environment-specific customization more explicit.
Choose directories when environments have significant structural differences, when you need strict isolation (especially for production), or when different teams manage different environments. Use workspaces when environments are nearly identical and isolation requirements are lower.
terraform/
├── modules/
│ └── app/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ │ └── ...
│ └── prod/
│ └── ...
Each environment has its own:
- State file (different backend key)
- Variable values
- Provider configuration if needed
What are the trade-offs between workspaces and directories?
This question tests your ability to evaluate solutions based on specific requirements rather than following a single pattern blindly. Both approaches have valid use cases, and senior engineers understand when to use each.
| Aspect | Workspaces | Directories |
|---|---|---|
| State isolation | Same backend, different keys | Completely separate |
| Code duplication | None | Some (can use modules) |
| Variable differences | Conditional logic | Separate tfvars |
| Accidental cross-apply | Possible (wrong workspace) | Harder (different directory) |
| Best for | Similar environments | Very different environments |
Recommendation: Use directories for prod vs non-prod, workspaces for similar environments (dev1, dev2).
Best Practices Questions
Following established patterns separates professional Terraform users from beginners.
How should you organize Terraform code in a project?
Code organization affects maintainability and collaboration. The standard pattern separates concerns into distinct files: main.tf for resources, variables.tf for inputs, outputs.tf for exports, and so on. This convention makes large configurations navigable and helps team members find what they're looking for quickly.
Consistent organization across projects reduces cognitive load when switching between codebases and makes onboarding new team members faster.
project/
├── main.tf # Primary resources
├── variables.tf # All variable declarations
├── outputs.tf # All outputs
├── versions.tf # Terraform and provider versions
├── providers.tf # Provider configurations
├── locals.tf # Local values
├── data.tf # Data sources
└── terraform.tfvars # Variable values (don't commit secrets)
versions.tf:
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}What naming conventions should you follow?
Consistent naming makes configurations readable and maintainable. Terraform conventions use lowercase with underscores for all identifiers. Resource names should be descriptive but concise, indicating what the resource is for. Variables and outputs follow the same pattern.
Following community conventions means your code is immediately readable by other Terraform users and tools that expect standard patterns will work correctly.
# Resources: descriptive, lowercase, underscores
resource "aws_instance" "web_server" { }
resource "aws_security_group" "web_sg" { }
# Variables: lowercase, underscores
variable "instance_type" { }
variable "environment_name" { }
# Outputs: lowercase, underscores, descriptive
output "load_balancer_dns" { }
# Locals: lowercase, underscores
locals {
common_tags = { }
}How should you handle secrets in Terraform?
Secrets management is critical and frequently tested in interviews because getting it wrong has serious security implications. Never hardcode secrets in Terraform files or commit them to version control. The state file contains sensitive data too, so always encrypt remote state at rest.
The best approaches reference secrets from external stores at runtime, use environment variables for CI/CD, or generate secrets with Terraform and immediately store them in a secrets manager. Mark sensitive outputs to prevent accidental exposure in logs.
Never do this:
# BAD - secrets in code
resource "aws_db_instance" "db" {
password = "supersecret123" # NO!
}Better approaches:
# 1. Variable with no default (prompt or tfvars)
variable "db_password" {
type = string
sensitive = true
}
# 2. Environment variable
# export TF_VAR_db_password=xxx
# 3. External secret store
data "aws_secretsmanager_secret_version" "db" {
secret_id = "prod/db/password"
}
resource "aws_db_instance" "db" {
password = data.aws_secretsmanager_secret_version.db.secret_string
}
# 4. Generate and store
resource "random_password" "db" {
length = 32
special = true
}
resource "aws_secretsmanager_secret_version" "db" {
secret_id = aws_secretsmanager_secret.db.id
secret_string = random_password.db.result
}How do you integrate Terraform with CI/CD?
CI/CD integration is standard for production Terraform usage. Pipelines enforce code quality checks, provide visibility into planned changes through PR comments, and ensure changes are applied consistently. Manual applies become exceptions rather than the norm.
Key practices include running format and validate on every commit, running plan on pull requests so reviewers see proposed changes, requiring approval before apply to production, and using OIDC authentication to avoid long-lived credentials.
GitHub Actions example:
name: Terraform
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Init
run: terraform init
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Terraform Format Check
run: terraform fmt -check
- name: Terraform Plan
run: terraform plan -no-color
if: github.event_name == 'pull_request'
- name: Terraform Apply
run: terraform apply -auto-approve
if: github.ref == 'refs/heads/main' && github.event_name == 'push'Best practices for CI/CD:
- Run
terraform fmt -checkto enforce formatting - Run
terraform validatefor syntax errors - Always run
planon PRs - Require approval before
applyto production - Use OIDC for cloud authentication (no long-lived keys)
Troubleshooting and Scenario Questions
Scenario questions test your practical experience with real-world problems.
How do you handle state drift?
State drift occurs when real infrastructure differs from what Terraform expects, usually because someone made manual changes. This is a common operational challenge that interviewers want to know you can handle.
The response depends on whether you want to keep the manual change or revert it. If keeping it, update your configuration to match reality. If reverting, let Terraform apply correct the drift. Prevention is better than cure—lock down manual access and make all changes through Terraform pipelines.
# 1. Detect drift
terraform plan
# Shows: aws_instance.web will be updated (instance_type changed)
# 2. Decision point:
# Keep manual change → update your config to match
# Revert manual change → apply to correct drift
# 3. If keeping, update config:
resource "aws_instance" "web" {
instance_type = "t3.large" # Match manual change
}
# 4. Verify no changes
terraform plan
# No changes. Your infrastructure matches the configuration.Prevention:
- Lock down console access
- Use CI/CD for all changes
- Enable drift detection alerts
- Regular
terraform planin CI
What do you do when Terraform apply fails halfway?
Partial failures leave your infrastructure in an inconsistent state—some resources created, others not. Understanding recovery is essential for production operations. Terraform's state accurately reflects what was created, so you can inspect it to understand the current situation.
The recovery path depends on the failure cause. Often you can fix the configuration error and re-run apply. Sometimes you need to taint a resource to force recreation. In rare cases, you might need to manually fix state or restore from a backup.
# 1. Check what was created
terraform state list
# 2. Check current state vs desired
terraform plan
# 3. Options:
# - Fix the error and re-run apply
# - If resource is broken, taint and recreate:
terraform taint aws_instance.web
terraform apply
# 4. If state is corrupted:
# - Restore from state backup (S3 versioning)
# - Or manually fix with state commandsHow do you rename a resource without destroying it?
Resource renaming is a common refactoring need, but Terraform interprets a name change as "delete old, create new." The state mv command tells Terraform the resource moved rather than being replaced, preserving the actual infrastructure.
This technique is essential for code cleanup without infrastructure impact. Always verify with a plan after moving to ensure Terraform sees no changes needed.
# Before
resource "aws_instance" "web" { }
# After
resource "aws_instance" "application" { }# 1. Move in state
terraform state mv aws_instance.web aws_instance.application
# 2. Update code to use new name
# 3. Verify no changes
terraform plan
# No changes.How do you migrate from local to remote state?
State migration is needed when adopting proper Terraform practices or changing backends. Terraform handles this gracefully with the init -migrate-state flag, which copies your existing state to the new backend.
Ensure the new backend is configured correctly before migrating. After migration, verify the remote state contains your resources and consider deleting the local state file to avoid confusion.
# 1. Add backend configuration
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
}
}
# 2. Initialize with migration
terraform init -migrate-state
# Terraform will prompt to copy existing state to new backendQuick Reference
What are the essential Terraform commands?
| Command | Purpose |
|---|---|
terraform init | Initialize working directory |
terraform plan | Preview changes |
terraform apply | Apply changes |
terraform destroy | Destroy infrastructure |
terraform fmt | Format code |
terraform validate | Validate syntax |
terraform state list | List resources in state |
terraform import | Import existing resource |
terraform output | Show outputs |
What are common Terraform patterns?
# Conditional resource
count = var.create_resource ? 1 : 0
# Conditional attribute
instance_type = var.env == "prod" ? "t3.large" : "t3.micro"
# Dynamic blocks
dynamic "ingress" {
for_each = var.ports
content {
from_port = ingress.value
to_port = ingress.value
protocol = "tcp"
}
}
# Depends on (explicit dependency)
depends_on = [aws_iam_role_policy.example]
# Lifecycle rules
lifecycle {
create_before_destroy = true
prevent_destroy = true
ignore_changes = [tags]
}Related Articles
This guide connects to the broader DevOps interview preparation:
Cloud Platforms:
- AWS Interview Guide - AWS resources and services
- Azure Interview Guide - Azure ARM comparison
- GCP Interview Guide - GCP Deployment Manager comparison
DevOps Fundamentals:
- CI/CD & GitHub Actions Interview Guide - Terraform in pipelines
- Docker Interview Guide - Container infrastructure
- Kubernetes Interview Guide - K8s provider
Architecture:
- System Design Interview Guide - Infrastructure patterns
- Networking Interview Guide - VPC and network resources
