Networking knowledge separates good DevOps engineers from great ones. When production goes down at 3 AM, you need to know whether the problem is DNS, a firewall rule, or a failing load balancer—and you need to know fast.
This guide covers the networking fundamentals that come up in DevOps, SRE, and backend interviews. Not certification-level theory, but practical knowledge for debugging real systems.
Networking Fundamentals
The OSI Model (Practical View)
You don't need to memorize all seven layers. Focus on the ones that matter for troubleshooting:
Layer 7 - Application HTTP, DNS, SSH (what your app speaks)
Layer 4 - Transport TCP, UDP (how data gets delivered)
Layer 3 - Network IP, routing (where data goes)
Layer 2 - Data Link MAC addresses, switches (local network)
Layer 1 - Physical Cables, signals (hardware)
Why it matters: When debugging, you work up the stack. Can't ping? Layer 3 issue. Can ping but can't connect to port? Layer 4. Connection works but app fails? Layer 7.
TCP vs UDP
The most common protocol question in interviews.
TCP (Transmission Control Protocol):
- Connection-oriented (three-way handshake)
- Guaranteed delivery with acknowledgments
- Ordered packets
- Flow control and congestion control
- Used for: HTTP, HTTPS, SSH, FTP, SMTP, databases
UDP (User Datagram Protocol):
- Connectionless (fire and forget)
- No delivery guarantee
- No ordering guarantee
- Lower latency, less overhead
- Used for: DNS queries, video streaming, gaming, VoIP
TCP Three-Way Handshake:
Client Server
|---- SYN ---->| "I want to connect"
|<-- SYN-ACK --| "OK, I acknowledge"
|---- ACK ---->| "Great, let's talk"
| |
|-- DATA <---> | Connection established
Example question: "When would you choose UDP over TCP?"
When latency matters more than reliability. Video streaming can skip frames—a retransmitted packet arriving late is useless. DNS queries are small and can simply retry. Gaming needs real-time updates where old data is worthless.
IP Addressing and Subnets
IPv4 addresses: Four octets (0-255 each), like 192.168.1.100
Private IP ranges (RFC 1918):
10.0.0.0/8- 16 million addresses (large networks)172.16.0.0/12- 1 million addresses (medium networks)192.168.0.0/16- 65,536 addresses (home/small networks)
CIDR notation: Network address + prefix length
10.0.0.0/8 = 10.0.0.0 - 10.255.255.255 (16,777,216 IPs)
10.0.0.0/16 = 10.0.0.0 - 10.0.255.255 (65,536 IPs)
10.0.0.0/24 = 10.0.0.0 - 10.0.0.255 (256 IPs)
10.0.0.0/32 = 10.0.0.0 (1 IP - single host)
Subnet math shortcut:
- /24 = 256 addresses (2^8)
- /25 = 128 addresses (2^7)
- /26 = 64 addresses (2^6)
- Subtract 2 for network and broadcast addresses
Example question: "Design IP addressing for a VPC with three subnets."
VPC: 10.0.0.0/16 (65,536 addresses total)
Subnets:
- Public: 10.0.1.0/24 (web servers, load balancers)
- Private: 10.0.2.0/24 (application servers)
- Data: 10.0.3.0/24 (databases)
Each subnet has 254 usable IPs, plenty of room to grow.
Common Ports
Know these by heart:
| Port | Service | Protocol |
|---|---|---|
| 22 | SSH | TCP |
| 80 | HTTP | TCP |
| 443 | HTTPS | TCP |
| 53 | DNS | UDP/TCP |
| 25 | SMTP | TCP |
| 3306 | MySQL | TCP |
| 5432 | PostgreSQL | TCP |
| 6379 | Redis | TCP |
| 27017 | MongoDB | TCP |
DNS Deep Dive
DNS translates human-readable names to IP addresses. It's involved in almost every network issue.
How DNS Resolution Works
1. Browser cache → Already know google.com? Use cached IP
2. OS cache → Check /etc/hosts and system DNS cache
3. Resolver → Ask configured DNS server (ISP, 8.8.8.8, etc.)
4. Root servers → "Who handles .com?"
5. TLD servers → "Who handles google.com?"
6. Authoritative NS → "google.com is 142.250.x.x"
7. Cache the result → Store for TTL duration
Recursive vs Iterative:
- Recursive: Resolver does all the work, returns final answer
- Iterative: Each server says "I don't know, ask them"
DNS Record Types
| Type | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com → 93.184.216.34 |
| AAAA | IPv6 address | example.com → 2606:2800:220:1:... |
| CNAME | Alias to another name | www.example.com → example.com |
| MX | Mail server (with priority) | example.com → mail.example.com (10) |
| TXT | Arbitrary text | SPF, DKIM, domain verification |
| NS | Nameserver delegation | example.com → ns1.example.com |
| PTR | Reverse lookup (IP → name) | 34.216.184.93 → example.com |
| SOA | Start of Authority | Zone metadata, serial numbers |
CNAME restrictions:
- Cannot be used at zone apex (root domain)
www.example.com→ CNAME OKexample.com→ CNAME NOT OK (use ALIAS or A record)
TTL (Time To Live)
How long DNS records are cached.
Low TTL (60-300 seconds):
+ Quick propagation for changes
+ Easier failover
- More DNS queries
- Higher latency
High TTL (3600-86400 seconds):
+ Fewer DNS queries
+ Better performance
- Slow propagation
- Harder to change quickly
Best practice: Use low TTL before planned changes, raise it after.
DNS Troubleshooting
# Basic lookup
dig example.com
nslookup example.com
# Query specific record type
dig example.com MX
dig example.com TXT
# Query specific nameserver
dig @8.8.8.8 example.com
# Trace the full resolution path
dig +trace example.com
# Check TTL remaining
dig example.com | grep -E "^example"
# example.com. 234 IN A 93.184.216.34
# ^^^-- seconds until cache expires
# Reverse lookup
dig -x 93.184.216.34Common DNS issues:
- NXDOMAIN: Domain doesn't exist
- SERVFAIL: Nameserver error
- Timeout: Network issue or server down
- Wrong IP: Stale cache or misconfiguration
HTTP & HTTPS
HTTP Methods
| Method | Purpose | Idempotent | Safe |
|---|---|---|---|
| GET | Retrieve resource | Yes | Yes |
| POST | Create resource | No | No |
| PUT | Replace resource | Yes | No |
| PATCH | Partial update | No | No |
| DELETE | Remove resource | Yes | No |
| HEAD | GET without body | Yes | Yes |
| OPTIONS | Get allowed methods | Yes | Yes |
Idempotent: Multiple identical requests have same effect as one. Safe: Doesn't modify server state.
HTTP Status Codes
1xx - Informational (100 Continue, 101 Switching Protocols)
2xx - Success (200 OK, 201 Created, 204 No Content)
3xx - Redirection (301 Permanent, 302 Found, 304 Not Modified)
4xx - Client Error (400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found)
5xx - Server Error (500 Internal Error, 502 Bad Gateway, 503 Service Unavailable)
Know these well:
- 401 vs 403: 401 = not authenticated, 403 = authenticated but not authorized
- 502 vs 503 vs 504: 502 = bad response from upstream, 503 = server overloaded, 504 = upstream timeout
TLS/SSL Handshake
Client Server
| |
|------ Client Hello --------->| Supported cipher suites, TLS version
|<----- Server Hello ----------| Chosen cipher, certificate
| |
| [Verify certificate chain] |
| |
|------ Key Exchange --------->| Generate session keys
|<----- Key Exchange ----------|
| |
|====== Encrypted Traffic =====| All data now encrypted
Certificate chain:
- Server certificate (your domain)
- Intermediate certificate(s)
- Root certificate (trusted by browsers)
Common TLS issues:
- Certificate expired: Check
notAfterdate - Name mismatch: Certificate doesn't match domain
- Incomplete chain: Missing intermediate certificate
- Self-signed: Not trusted by default
# Check certificate
openssl s_client -connect example.com:443 -servername example.com
# Check expiration
echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -datesHTTP/2 and HTTP/3
HTTP/1.1 limitations:
- One request per connection (head-of-line blocking)
- Text-based headers (inefficient)
- No server push
HTTP/2 improvements:
- Multiplexing (multiple requests over one connection)
- Header compression (HPACK)
- Server push
- Binary protocol
HTTP/3 improvements:
- QUIC protocol (UDP-based)
- No head-of-line blocking at transport layer
- Faster connection establishment
- Better mobile performance (connection migration)
Load Balancing
Layer 4 vs Layer 7
Layer 4 (Transport):
- Routes based on IP address and port
- No content inspection
- Faster, less CPU intensive
- Use for: TCP/UDP passthrough, non-HTTP protocols
Layer 7 (Application):
- Inspects HTTP headers, URLs, cookies
- Can route based on content
- SSL termination
- Use for: HTTP routing, path-based routing, A/B testing
Layer 4 Load Balancer:
Client → [L4 LB] → Server
(routes by IP:port)
Layer 7 Load Balancer:
Client → [L7 LB] → Server
(routes by /api/* → api-servers)
(routes by /static/* → cdn)
Load Balancing Algorithms
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Rotate through servers sequentially | Equal capacity servers |
| Weighted Round Robin | Rotate with weights | Mixed capacity servers |
| Least Connections | Send to server with fewest connections | Varying request duration |
| IP Hash | Hash client IP to choose server | Session affinity |
| Least Response Time | Send to fastest responding server | Performance optimization |
| Random | Random selection | Simple, surprisingly effective |
Example question: "Your users report slow responses. How might the load balancing algorithm affect this?"
If using round robin with servers of different capacities, slower servers get equal traffic and become bottlenecks. Switch to least connections or weighted round robin. If requests have varying durations, least connections prevents queue buildup on slow servers.
Health Checks
Load balancers need to know which backends are healthy.
Health Check Types:
TCP Check:
- Can we connect to port 80?
- Fast, basic
HTTP Check:
- Does GET /health return 200?
- Application-aware
Custom Check:
- Does /health return {"status": "ok", "db": "connected"}?
- Deep health verification
Health check parameters:
- Interval: How often to check (e.g., 10 seconds)
- Timeout: How long to wait for response (e.g., 5 seconds)
- Threshold: How many failures before marking unhealthy (e.g., 3)
- Recovery: How many successes before marking healthy (e.g., 2)
Sticky Sessions
Keep a user connected to the same backend server.
Methods:
- Cookie-based: Load balancer sets a cookie with server ID
- IP-based: Hash client IP (problems with NAT)
- Application-based: App sets session cookie, LB reads it
Trade-offs:
Pros:
+ Session state stays on one server
+ Simpler application code
+ Better cache hit rates
Cons:
- Uneven load distribution
- Server failure loses sessions
- Harder to scale down
Better alternative: Externalize session state to Redis or database.
Firewalls & Security
Firewall Rules
Firewalls filter traffic based on rules evaluated in order.
Rule Structure:
[Priority] [Action] [Protocol] [Source] [Destination] [Port]
Example rules:
1. ALLOW TCP 10.0.0.0/8 any 22 # SSH from internal
2. ALLOW TCP any any 443 # HTTPS from anywhere
3. ALLOW TCP any any 80 # HTTP from anywhere
4. DENY any any any any # Default deny
Stateful vs Stateless:
| Stateful | Stateless |
|---|---|
| Tracks connections | No connection tracking |
| Return traffic automatic | Need explicit return rules |
| More memory usage | Less resource intensive |
| Easier to configure | More rules needed |
| Security groups (AWS) | NACLs (AWS) |
Network Segmentation
Divide networks into security zones:
┌─────────────────────────────────────────────┐
│ Internet │
└──────────────────────┬──────────────────────┘
│
┌────────▼────────┐
│ Load Balancer │
│ (Public) │
└────────┬────────┘
│
┌──────────────────────┼──────────────────────┐
│ DMZ │ │
│ ┌───────▼───────┐ │
│ │ Web Servers │ │
│ └───────┬───────┘ │
└──────────────────────┼──────────────────────┘
│ (Firewall)
┌──────────────────────┼──────────────────────┐
│ Private │ │
│ ┌───────▼───────┐ │
│ │ App Servers │ │
│ └───────┬───────┘ │
└──────────────────────┼──────────────────────┘
│ (Firewall)
┌──────────────────────┼──────────────────────┐
│ Data │ │
│ ┌───────▼───────┐ │
│ │ Databases │ │
│ └───────────────┘ │
└─────────────────────────────────────────────┘
NAT (Network Address Translation)
Translates private IPs to public IPs.
Types:
- SNAT (Source NAT): Change source IP (outbound traffic)
- DNAT (Destination NAT): Change destination IP (inbound traffic)
- PAT (Port Address Translation): Many private IPs share one public IP
NAT Example (outbound):
Private NAT Gateway Internet
10.0.1.50 ──────────> 203.0.113.5:12345 ──────────> example.com
source IP translated to
changed public IP + port
NAT Gateway/Instance: Allows private subnets to access internet without being directly accessible.
Troubleshooting Tools
Connectivity Testing
# Basic connectivity
ping example.com
ping -c 4 example.com # Stop after 4 pings
# Trace route to destination
traceroute example.com # Linux/Mac
tracert example.com # Windows
mtr example.com # Better traceroute (continuous)
# Test specific port
telnet example.com 80
nc -zv example.com 80 # Netcat
nc -zv example.com 20-25 # Port rangeNetwork Statistics
# Show listening ports
netstat -tulpn # Linux
netstat -an | grep LISTEN # Mac
# Modern alternative to netstat
ss -tulpn # Show listening ports
ss -s # Socket statistics
# What's using a port?
lsof -i :80 # What process has port 80
fuser 80/tcp # AlternativePacket Analysis
# Capture packets
tcpdump -i eth0 # All traffic on interface
tcpdump -i eth0 port 80 # Only port 80
tcpdump -i eth0 host 10.0.1.50 # Only specific host
tcpdump -i eth0 -w capture.pcap # Save to file
# Read capture file
tcpdump -r capture.pcap
wireshark capture.pcap # GUI analysis
# Useful filters
tcpdump 'tcp[tcpflags] & (tcp-syn) != 0' # Only SYN packets
tcpdump -A port 80 # Show ASCII contentHTTP Debugging with curl
# Basic request
curl https://example.com
# Show headers
curl -I https://example.com # HEAD request (headers only)
curl -i https://example.com # Include headers in output
# Verbose output (see handshake)
curl -v https://example.com
# Follow redirects
curl -L https://example.com
# Custom headers
curl -H "Authorization: Bearer token" https://api.example.com
# POST with data
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" https://api.example.com
# Time the request
curl -w "@curl-format.txt" -o /dev/null -s https://example.com
# curl-format.txt:
# time_namelookup: %{time_namelookup}s\n
# time_connect: %{time_connect}s\n
# time_appconnect: %{time_appconnect}s\n
# time_total: %{time_total}s\nCommon Interview Scenarios
"What happens when you type google.com in a browser?"
This classic question tests end-to-end understanding:
-
URL Parsing: Browser extracts protocol (https), hostname (google.com), path (/)
-
DNS Resolution:
- Check browser cache
- Check OS cache
- Query DNS resolver
- Recursive lookup through root → TLD → authoritative
- Cache result based on TTL
-
TCP Connection:
- Three-way handshake (SYN, SYN-ACK, ACK)
- Connection to IP on port 443
-
TLS Handshake:
- Client Hello (supported ciphers)
- Server Hello (chosen cipher, certificate)
- Certificate verification
- Key exchange
- Encrypted channel established
-
HTTP Request:
- GET / HTTP/2
- Headers (Host, User-Agent, Accept, etc.)
-
Server Processing:
- Load balancer routes request
- Web server processes
- Backend calls if needed
- Response generated
-
Response:
- Status code (200 OK)
- Headers (Content-Type, Cache-Control)
- Body (HTML)
-
Rendering:
- Parse HTML
- Fetch CSS, JS, images (parallel requests)
- Build DOM and CSSOM
- Execute JavaScript
- Paint to screen
Debugging Connectivity Issues
Systematic approach:
# 1. Can we resolve the hostname?
dig api.example.com
# If NXDOMAIN → DNS issue
# 2. Can we reach the IP?
ping 93.184.216.34
# If timeout → routing/firewall issue
# 3. Can we reach the port?
nc -zv 93.184.216.34 443
# If refused → service not running or firewall
# 4. Is TLS working?
openssl s_client -connect api.example.com:443
# If handshake fails → certificate issue
# 5. Does HTTP work?
curl -v https://api.example.com/health
# If error → application issueDesigning for High Availability
Key patterns:
- Multiple availability zones: Servers in different data centers
- Load balancer with health checks: Automatically remove failed instances
- DNS failover: Route53 health checks, multiple A records
- Connection draining: Graceful shutdown for existing connections
- Retry with backoff: Client-side resilience
High Availability Architecture:
┌─────────────┐
│ Route53 │ (DNS with health checks)
└──────┬──────┘
│
┌──────▼──────┐
│ ALB │ (Cross-zone load balancing)
└──────┬──────┘
│
┌─────────┼─────────┐
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│ AZ-1 │ │ AZ-2 │ │ AZ-3 │
│ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │
│ │App│ │ │ │App│ │ │ │App│ │
│ └───┘ │ │ └───┘ │ │ └───┘ │
└───────┘ └───────┘ └───────┘
Quick Reference
Essential Commands
| Task | Command |
|---|---|
| DNS lookup | dig example.com |
| Trace route | mtr example.com |
| Test port | nc -zv host port |
| Show connections | ss -tulpn |
| Capture packets | tcpdump -i eth0 |
| HTTP request | curl -v https://example.com |
| Check certificate | openssl s_client -connect host:443 |
Common Ports Quick Reference
22 SSH 443 HTTPS 6379 Redis
80 HTTP 3306 MySQL 27017 MongoDB
53 DNS 5432 PostgreSQL 9200 Elasticsearch
25 SMTP 8080 Alt HTTP 2379 etcd
Troubleshooting Checklist
□ DNS resolving correctly?
□ IP reachable (ping)?
□ Port open (nc/telnet)?
□ Firewall rules allow traffic?
□ Service running on target?
□ TLS certificate valid?
□ Application responding?
□ Correct response code?
Related Articles
This guide connects to the broader DevOps interview preparation:
Infrastructure:
- Linux Commands Interview Guide - Command-line fundamentals
- Docker Interview Guide - Container networking
- Kubernetes Interview Guide - Service networking, ingress
Cloud Platforms:
- AWS Interview Guide - VPCs, security groups, ELB
- Azure Interview Guide - VNets, NSGs, load balancers
- GCP Interview Guide - Global VPCs, firewall rules
Architecture:
- System Design Interview Guide - Designing distributed systems
- Web Security & OWASP Interview Guide - Network security
Final Thoughts
Networking interviews test practical knowledge—not memorized theory. Interviewers want to see:
- Systematic debugging: Work through layers methodically
- Tool familiarity: Know dig, curl, tcpdump, netstat
- Protocol understanding: TCP vs UDP, HTTP status codes, DNS records
- Security awareness: Firewalls, encryption, segmentation
The best preparation is hands-on practice. Break things, debug them, understand why they failed. That experience shows in interviews.
