Networking knowledge separates good DevOps engineers from great ones. When production goes down at 3 AM, you need to know whether the problem is DNS, a firewall rule, or a failing load balancer—and you need to know fast.
This guide covers the networking fundamentals that come up in DevOps, SRE, and backend interviews. Not certification-level theory, but practical knowledge for debugging real systems.
Table of Contents
- Networking Fundamentals Questions
- DNS Questions
- HTTP and HTTPS Questions
- Load Balancing Questions
- Firewall and Security Questions
- Network Troubleshooting Questions
- Classic Interview Scenario Questions
- Quick Reference
Networking Fundamentals Questions
Understanding the core networking concepts is essential for any DevOps or backend engineer interview.
What is the OSI model and which layers matter most for troubleshooting?
The OSI model is a conceptual framework that describes how data moves through a network in seven layers. While you don't need to memorize all seven layers for most interviews, understanding the practical layers helps you systematically debug network issues.
When troubleshooting, you typically work up the stack: if you can't ping, it's a Layer 3 issue; if you can ping but can't connect to a port, it's Layer 4; if the connection works but the app fails, it's Layer 7.
Layer 7 - Application HTTP, DNS, SSH (what your app speaks)
Layer 4 - Transport TCP, UDP (how data gets delivered)
Layer 3 - Network IP, routing (where data goes)
Layer 2 - Data Link MAC addresses, switches (local network)
Layer 1 - Physical Cables, signals (hardware)
What is the difference between TCP and UDP?
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are the two main transport layer protocols, each designed for different use cases. TCP prioritizes reliability, while UDP prioritizes speed.
TCP is connection-oriented, meaning it establishes a connection before sending data using a three-way handshake. It guarantees delivery through acknowledgments, ensures packets arrive in order, and implements flow control and congestion control. This makes TCP ideal for HTTP, HTTPS, SSH, FTP, SMTP, and database connections.
UDP is connectionless—it simply sends packets without establishing a connection first. There's no delivery guarantee or ordering guarantee, but this results in lower latency and less overhead. UDP is used for DNS queries, video streaming, gaming, and VoIP where occasional packet loss is acceptable.
sequenceDiagram
participant C as Client
participant S as Server
C->>S: SYN (I want to connect)
S->>C: SYN-ACK (OK, I acknowledge)
C->>S: ACK (Great, let's talk)
C<-->S: DATA (Connection established)When would you choose UDP over TCP?
You would choose UDP over TCP when latency matters more than reliability. The key insight is that in real-time applications, a retransmitted packet arriving late is often useless.
Video streaming can skip frames—if a packet is lost, showing the next frame is better than waiting for retransmission. DNS queries are small and can simply retry if the response doesn't arrive. Gaming needs real-time updates where old positional data is worthless by the time it arrives. VoIP works similarly—slightly degraded audio quality is better than choppy delayed audio from retransmissions.
How do IP addressing and subnetting work?
IP addressing and subnetting are fundamental to network design and troubleshooting. IPv4 addresses consist of four octets (0-255 each), like 192.168.1.100. Subnetting allows you to divide networks into smaller, more manageable segments.
Private IP ranges defined in RFC 1918 are not routable on the internet and can be used freely within your network. The 10.0.0.0/8 range provides 16 million addresses for large networks. The 172.16.0.0/12 range offers 1 million addresses for medium networks. The 192.168.0.0/16 range provides 65,536 addresses for home and small networks.
CIDR notation combines the network address with a prefix length indicating how many bits are used for the network portion:
10.0.0.0/8 = 10.0.0.0 - 10.255.255.255 (16,777,216 IPs)
10.0.0.0/16 = 10.0.0.0 - 10.0.255.255 (65,536 IPs)
10.0.0.0/24 = 10.0.0.0 - 10.0.0.255 (256 IPs)
10.0.0.0/32 = 10.0.0.0 (1 IP - single host)
Subnet math shortcut: Each decrease in prefix length doubles the addresses. A /24 gives 256 addresses (2^8), /25 gives 128 (2^7), /26 gives 64 (2^6). Remember to subtract 2 for network and broadcast addresses.
How would you design IP addressing for a VPC with three subnets?
When designing IP addressing for a cloud VPC, you need to balance having enough addresses for growth while keeping subnets logically organized. A common pattern is to use a /16 for the VPC and /24 for individual subnets.
This approach gives you 256 possible subnets within your VPC, each with 254 usable IP addresses—plenty of room to grow while maintaining clear organization.
VPC: 10.0.0.0/16 (65,536 addresses total)
Subnets:
- Public: 10.0.1.0/24 (web servers, load balancers)
- Private: 10.0.2.0/24 (application servers)
- Data: 10.0.3.0/24 (databases)
Each subnet has 254 usable IPs, plenty of room to grow.
What ports should every developer know?
Understanding common ports helps you quickly diagnose connection issues and configure firewalls. These ports are standardized by IANA and used consistently across systems.
| Port | Service | Protocol |
|---|---|---|
| 22 | SSH | TCP |
| 80 | HTTP | TCP |
| 443 | HTTPS | TCP |
| 53 | DNS | UDP/TCP |
| 25 | SMTP | TCP |
| 3306 | MySQL | TCP |
| 5432 | PostgreSQL | TCP |
| 6379 | Redis | TCP |
| 27017 | MongoDB | TCP |
DNS Questions
DNS (Domain Name System) translates human-readable domain names to IP addresses. It's involved in almost every network issue you'll debug.
How does DNS resolution work step by step?
DNS resolution is a hierarchical lookup process that converts domain names to IP addresses. Understanding each step helps you debug DNS issues and optimize DNS performance.
When you request a domain, your system checks multiple caches before making network requests. If no cache has the answer, a recursive lookup begins, traveling from root servers down to authoritative nameservers.
1. Browser cache → Already know google.com? Use cached IP
2. OS cache → Check /etc/hosts and system DNS cache
3. Resolver → Ask configured DNS server (ISP, 8.8.8.8, etc.)
4. Root servers → "Who handles .com?"
5. TLD servers → "Who handles google.com?"
6. Authoritative NS → "google.com is 142.250.x.x"
7. Cache the result → Store for TTL duration
Recursive vs Iterative resolution: In recursive resolution, the resolver does all the work and returns the final answer. In iterative resolution, each server says "I don't know, ask them" and the client must make subsequent queries.
What are the main DNS record types and when do you use each?
DNS records serve different purposes, and choosing the right record type is essential for proper domain configuration. Each record type stores specific information about how to handle requests for a domain.
| Type | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com → 93.184.216.34 |
| AAAA | IPv6 address | example.com → 2606:2800:220:1:... |
| CNAME | Alias to another name | www.example.com → example.com |
| MX | Mail server (with priority) | example.com → mail.example.com (10) |
| TXT | Arbitrary text | SPF, DKIM, domain verification |
| NS | Nameserver delegation | example.com → ns1.example.com |
| PTR | Reverse lookup (IP → name) | 34.216.184.93 → example.com |
| SOA | Start of Authority | Zone metadata, serial numbers |
Important CNAME restriction: CNAME records cannot be used at the zone apex (root domain). You can use www.example.com as a CNAME, but example.com itself cannot be a CNAME—use an ALIAS record or A record instead.
What is TTL and how do you choose the right value?
TTL (Time To Live) determines how long DNS records are cached by resolvers and clients. Choosing the right TTL involves balancing between quick propagation and reduced DNS query load.
Low TTL (60-300 seconds) enables quick propagation when you make changes and easier failover during incidents. However, it results in more DNS queries and slightly higher latency for users.
High TTL (3600-86400 seconds) means fewer DNS queries and better performance since responses are cached longer. The downside is slow propagation when you need to make changes and difficulty switching quickly during emergencies.
Best practice: Lower your TTL before planned changes (like migrations), then raise it back afterward. This gives you flexibility when you need it while maintaining performance during normal operation.
How do you troubleshoot DNS issues?
DNS troubleshooting requires systematic investigation using command-line tools. These commands help you identify whether DNS is the root cause of connectivity problems.
# Basic lookup
dig example.com
nslookup example.com
# Query specific record type
dig example.com MX
dig example.com TXT
# Query specific nameserver
dig @8.8.8.8 example.com
# Trace the full resolution path
dig +trace example.com
# Check TTL remaining
dig example.com | grep -E "^example"
# example.com. 234 IN A 93.184.216.34
# ^^^-- seconds until cache expires
# Reverse lookup
dig -x 93.184.216.34Common DNS error codes:
- NXDOMAIN: Domain doesn't exist (check spelling or registration)
- SERVFAIL: Nameserver error (problem with authoritative servers)
- Timeout: Network issue or DNS server down
- Wrong IP: Stale cache or misconfiguration
HTTP and HTTPS Questions
Understanding HTTP is essential for debugging web applications and APIs.
What are the HTTP methods and what makes them idempotent or safe?
HTTP methods define the intended action for a request. Understanding idempotence and safety helps you design APIs correctly and debug unexpected behavior.
Idempotent means multiple identical requests have the same effect as a single request. Safe means the request doesn't modify server state. These properties affect how clients can retry requests and how proxies can cache responses.
| Method | Purpose | Idempotent | Safe |
|---|---|---|---|
| GET | Retrieve resource | Yes | Yes |
| POST | Create resource | No | No |
| PUT | Replace resource | Yes | No |
| PATCH | Partial update | No | No |
| DELETE | Remove resource | Yes | No |
| HEAD | GET without body | Yes | Yes |
| OPTIONS | Get allowed methods | Yes | Yes |
What HTTP status codes should you know and what do they mean?
HTTP status codes communicate the result of a request. Knowing the common codes helps you quickly understand what's happening when debugging issues.
1xx - Informational (100 Continue, 101 Switching Protocols)
2xx - Success (200 OK, 201 Created, 204 No Content)
3xx - Redirection (301 Permanent, 302 Found, 304 Not Modified)
4xx - Client Error (400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found)
5xx - Server Error (500 Internal Error, 502 Bad Gateway, 503 Service Unavailable)
Key distinctions:
- 401 vs 403: 401 means not authenticated (who are you?), 403 means authenticated but not authorized (you can't do that)
- 502 vs 503 vs 504: 502 means bad response from upstream server, 503 means server is overloaded or down for maintenance, 504 means upstream server timed out
How does the TLS/SSL handshake work?
TLS (Transport Layer Security) encrypts communication between client and server. Understanding the handshake helps you debug certificate issues and performance problems.
The handshake establishes a secure connection by verifying the server's identity and generating shared encryption keys. This all happens before any application data is exchanged.
sequenceDiagram
participant C as Client
participant S as Server
C->>S: Client Hello (cipher suites, TLS version)
S->>C: Server Hello (chosen cipher, certificate)
Note over C: Verify certificate chain
C->>S: Key Exchange (generate session keys)
S->>C: Key Exchange
C<-->S: Encrypted TrafficCertificate chain verification:
- Server certificate (your domain)
- Intermediate certificate(s)
- Root certificate (trusted by browsers)
Common TLS issues and how to debug them:
- Certificate expired: Check the
notAfterdate - Name mismatch: Certificate doesn't match the domain you're connecting to
- Incomplete chain: Missing intermediate certificate
- Self-signed: Not trusted by default (need to add to trust store)
# Check certificate
openssl s_client -connect example.com:443 -servername example.com
# Check expiration
echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -datesWhat are the differences between HTTP/1.1, HTTP/2, and HTTP/3?
HTTP has evolved significantly to address performance limitations. Each version introduces improvements in how data is transmitted between clients and servers.
HTTP/1.1 limitations: Only one request per connection (head-of-line blocking), text-based headers (inefficient), and no server push capability.
HTTP/2 improvements: Multiplexing allows multiple requests over a single connection. Header compression (HPACK) reduces overhead. Server push lets servers send resources proactively. The binary protocol is more efficient to parse.
HTTP/3 improvements: Uses QUIC protocol built on UDP instead of TCP. Eliminates head-of-line blocking at the transport layer. Offers faster connection establishment. Provides better mobile performance through connection migration when switching networks.
Load Balancing Questions
Load balancers distribute traffic across multiple servers to improve availability and performance.
What is the difference between Layer 4 and Layer 7 load balancing?
Layer 4 and Layer 7 load balancing operate at different levels of the network stack and offer different capabilities. Your choice depends on what information you need to make routing decisions.
Layer 4 (Transport) load balancers route based on IP address and port without inspecting the content. They're faster and less CPU intensive but can only make routing decisions based on network-level information. Use them for TCP/UDP passthrough or non-HTTP protocols.
Layer 7 (Application) load balancers inspect HTTP headers, URLs, and cookies. They can route based on content, perform SSL termination, and modify requests. Use them for HTTP routing, path-based routing, and A/B testing.
flowchart LR
subgraph L4["Layer 4 Load Balancer"]
C1["Client"] --> L4LB["L4 LB<br/>(routes by IP:port)"]
L4LB --> S1["Server"]
end
subgraph L7["Layer 7 Load Balancer"]
C2["Client"] --> L7LB["L7 LB"]
L7LB -->|"/api/*"| API["API Servers"]
L7LB -->|"/static/*"| CDN["CDN"]
endWhat load balancing algorithms exist and when do you use each?
Load balancing algorithms determine how traffic is distributed across backend servers. The right choice depends on your server capacities and request characteristics.
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Rotate through servers sequentially | Equal capacity servers |
| Weighted Round Robin | Rotate with weights | Mixed capacity servers |
| Least Connections | Send to server with fewest connections | Varying request duration |
| IP Hash | Hash client IP to choose server | Session affinity |
| Least Response Time | Send to fastest responding server | Performance optimization |
| Random | Random selection | Simple, surprisingly effective |
How would slow responses be affected by load balancing algorithm choice?
If users report slow responses and you're using round robin with servers of different capacities, slower servers receive equal traffic and become bottlenecks. The solution is to switch to least connections or weighted round robin.
If requests have varying durations (some quick, some slow), round robin can cause queue buildup on servers that happen to receive multiple slow requests. Least connections prevents this by always sending to the server with the most available capacity.
How do health checks work in load balancing?
Health checks allow load balancers to detect unhealthy backends and stop sending traffic to them. Without proper health checks, users may be routed to failed servers.
Health check types:
- TCP Check: Can we connect to the port? Fast but basic.
- HTTP Check: Does GET /health return 200? Application-aware.
- Custom Check: Does /health return
{"status": "ok", "db": "connected"}? Deep verification.
Health check parameters:
- Interval: How often to check (e.g., 10 seconds)
- Timeout: How long to wait for response (e.g., 5 seconds)
- Threshold: How many failures before marking unhealthy (e.g., 3)
- Recovery: How many successes before marking healthy (e.g., 2)
What are sticky sessions and what are the trade-offs?
Sticky sessions (session affinity) keep a user connected to the same backend server throughout their session. This simplifies applications that store session state locally but introduces operational challenges.
Methods for implementing sticky sessions:
- Cookie-based: Load balancer sets a cookie with server ID
- IP-based: Hash client IP (problems with NAT and mobile users)
- Application-based: App sets session cookie, load balancer reads it
Trade-offs:
| Pros | Cons |
|---|---|
| Session state stays on one server | Uneven load distribution |
| Simpler application code | Server failure loses sessions |
| Better cache hit rates | Harder to scale down |
Better alternative: Externalize session state to Redis or a database. This allows any server to handle any request while maintaining session continuity.
Firewall and Security Questions
Understanding firewalls and network security is essential for secure infrastructure design.
How do firewall rules work?
Firewalls filter network traffic based on rules evaluated in priority order. Understanding rule structure and evaluation order is critical for both security and troubleshooting.
Rules typically specify priority, action (allow/deny), protocol, source address, destination address, and port.
Rule Structure:
[Priority] [Action] [Protocol] [Source] [Destination] [Port]
Example rules:
1. ALLOW TCP 10.0.0.0/8 any 22 # SSH from internal
2. ALLOW TCP any any 443 # HTTPS from anywhere
3. ALLOW TCP any any 80 # HTTP from anywhere
4. DENY any any any any # Default deny
What is the difference between stateful and stateless firewalls?
Stateful and stateless firewalls differ in how they track connections. This affects both configuration complexity and resource usage.
| Stateful | Stateless |
|---|---|
| Tracks connections | No connection tracking |
| Return traffic automatic | Need explicit return rules |
| More memory usage | Less resource intensive |
| Easier to configure | More rules needed |
| Security groups (AWS) | NACLs (AWS) |
Stateful firewalls remember that an outbound connection was made and automatically allow the return traffic. Stateless firewalls require you to explicitly allow traffic in both directions.
How should you segment networks for security?
Network segmentation divides your infrastructure into security zones, limiting the blast radius of a breach and controlling traffic flow between components.
The principle is defense in depth—multiple layers of security that an attacker must breach. Even if someone compromises your web servers, they shouldn't automatically have access to your databases.
flowchart TB
Internet["Internet"]
Internet --> LB["Load Balancer<br/>(Public)"]
subgraph DMZ["DMZ Zone"]
Web["Web Servers"]
end
subgraph Private["Private Zone"]
App["App Servers"]
end
subgraph Data["Data Zone"]
DB["Databases"]
end
LB --> Web
Web -->|"Firewall"| App
App -->|"Firewall"| DBWhat is NAT and how does it work?
NAT (Network Address Translation) translates private IP addresses to public IP addresses. It allows multiple devices to share a single public IP and provides a layer of security by hiding internal network structure.
Types of NAT:
- SNAT (Source NAT): Changes source IP for outbound traffic
- DNAT (Destination NAT): Changes destination IP for inbound traffic
- PAT (Port Address Translation): Many private IPs share one public IP using different ports
flowchart LR
Private["Private<br/>10.0.1.50"] -->|"source IP<br/>changed"| NAT["NAT Gateway<br/>203.0.113.5:12345"]
NAT -->|"translated to<br/>public IP + port"| Internet["example.com"]NAT Gateway/Instance: Allows private subnets to access the internet without being directly accessible from outside, providing both connectivity and security.
Network Troubleshooting Questions
Systematic debugging is essential for resolving network issues quickly.
What tools do you use for connectivity testing?
Connectivity testing tools help you identify where in the network path a problem occurs. Starting with basic connectivity and working up to application-level tests is the systematic approach.
# Basic connectivity
ping example.com
ping -c 4 example.com # Stop after 4 pings
# Trace route to destination
traceroute example.com # Linux/Mac
tracert example.com # Windows
mtr example.com # Better traceroute (continuous)
# Test specific port
telnet example.com 80
nc -zv example.com 80 # Netcat
nc -zv example.com 20-25 # Port rangeHow do you check what ports are in use on a system?
Knowing what ports are in use and which processes own them is essential for debugging services that won't start or for security auditing.
# Show listening ports
netstat -tulpn # Linux
netstat -an | grep LISTEN # Mac
# Modern alternative to netstat
ss -tulpn # Show listening ports
ss -s # Socket statistics
# What's using a port?
lsof -i :80 # What process has port 80
fuser 80/tcp # AlternativeHow do you capture and analyze network packets?
Packet capture is the ultimate debugging tool—it shows you exactly what's happening on the wire. Use it when higher-level tools don't reveal the problem.
# Capture packets
tcpdump -i eth0 # All traffic on interface
tcpdump -i eth0 port 80 # Only port 80
tcpdump -i eth0 host 10.0.1.50 # Only specific host
tcpdump -i eth0 -w capture.pcap # Save to file
# Read capture file
tcpdump -r capture.pcap
wireshark capture.pcap # GUI analysis
# Useful filters
tcpdump 'tcp[tcpflags] & (tcp-syn) != 0' # Only SYN packets
tcpdump -A port 80 # Show ASCII contentWhat curl commands are essential for HTTP debugging?
curl is your Swiss Army knife for debugging HTTP issues. Knowing these commands helps you quickly isolate whether problems are with DNS, TLS, or the application.
# Basic request
curl https://example.com
# Show headers
curl -I https://example.com # HEAD request (headers only)
curl -i https://example.com # Include headers in output
# Verbose output (see handshake)
curl -v https://example.com
# Follow redirects
curl -L https://example.com
# Custom headers
curl -H "Authorization: Bearer token" https://api.example.com
# POST with data
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" https://api.example.com
# Time the request
curl -w "@curl-format.txt" -o /dev/null -s https://example.com
# curl-format.txt:
# time_namelookup: %{time_namelookup}s\n
# time_connect: %{time_connect}s\n
# time_appconnect: %{time_appconnect}s\n
# time_total: %{time_total}s\nClassic Interview Scenario Questions
These scenarios test your end-to-end understanding of networking concepts.
What happens when you type google.com in a browser?
This classic question tests comprehensive understanding of web request lifecycle. Interviewers want to see that you understand each layer and can explain how they connect.
1. URL Parsing: Browser extracts protocol (https), hostname (google.com), path (/)
2. DNS Resolution:
- Check browser cache
- Check OS cache
- Query DNS resolver
- Recursive lookup through root → TLD → authoritative
- Cache result based on TTL
3. TCP Connection:
- Three-way handshake (SYN, SYN-ACK, ACK)
- Connection to IP on port 443
4. TLS Handshake:
- Client Hello (supported ciphers)
- Server Hello (chosen cipher, certificate)
- Certificate verification
- Key exchange
- Encrypted channel established
5. HTTP Request:
- GET / HTTP/2
- Headers (Host, User-Agent, Accept, etc.)
6. Server Processing:
- Load balancer routes request
- Web server processes
- Backend calls if needed
- Response generated
7. Response:
- Status code (200 OK)
- Headers (Content-Type, Cache-Control)
- Body (HTML)
8. Rendering:
- Parse HTML
- Fetch CSS, JS, images (parallel requests)
- Build DOM and CSSOM
- Execute JavaScript
- Paint to screen
How do you systematically debug connectivity issues?
A systematic approach prevents you from chasing red herrings. Work through the network stack layer by layer until you find where it breaks.
# 1. Can we resolve the hostname?
dig api.example.com
# If NXDOMAIN → DNS issue
# 2. Can we reach the IP?
ping 93.184.216.34
# If timeout → routing/firewall issue
# 3. Can we reach the port?
nc -zv 93.184.216.34 443
# If refused → service not running or firewall
# 4. Is TLS working?
openssl s_client -connect api.example.com:443
# If handshake fails → certificate issue
# 5. Does HTTP work?
curl -v https://api.example.com/health
# If error → application issueHow do you design for high availability?
High availability requires eliminating single points of failure at every layer. Interviewers want to see that you understand redundancy, health checking, and graceful degradation.
Key patterns:
- Multiple availability zones: Servers in different data centers
- Load balancer with health checks: Automatically remove failed instances
- DNS failover: Route53 health checks, multiple A records
- Connection draining: Graceful shutdown for existing connections
- Retry with backoff: Client-side resilience
flowchart TB
R53["Route53<br/>(DNS with health checks)"]
ALB["ALB<br/>(Cross-zone load balancing)"]
R53 --> ALB
subgraph AZ1["AZ-1"]
App1["App"]
end
subgraph AZ2["AZ-2"]
App2["App"]
end
subgraph AZ3["AZ-3"]
App3["App"]
end
ALB --> App1
ALB --> App2
ALB --> App3Quick Reference
What are the essential networking commands?
These commands cover the most common debugging scenarios. Memorizing them allows you to quickly diagnose issues.
| Task | Command |
|---|---|
| DNS lookup | dig example.com |
| Trace route | mtr example.com |
| Test port | nc -zv host port |
| Show connections | ss -tulpn |
| Capture packets | tcpdump -i eth0 |
| HTTP request | curl -v https://example.com |
| Check certificate | openssl s_client -connect host:443 |
What is the quick reference for common ports?
22 SSH 443 HTTPS 6379 Redis
80 HTTP 3306 MySQL 27017 MongoDB
53 DNS 5432 PostgreSQL 9200 Elasticsearch
25 SMTP 8080 Alt HTTP 2379 etcd
What is your troubleshooting checklist?
□ DNS resolving correctly?
□ IP reachable (ping)?
□ Port open (nc/telnet)?
□ Firewall rules allow traffic?
□ Service running on target?
□ TLS certificate valid?
□ Application responding?
□ Correct response code?
Related Articles
- Linux Commands Interview Guide - Command-line fundamentals
- Docker Interview Guide - Container networking
- Kubernetes Interview Guide - Service networking, ingress
- AWS Interview Guide - VPCs, security groups, ELB
- System Design Interview Guide - Designing distributed systems
