Relational databases remain the backbone of most applications. While basic SQL is table stakes, interviews for senior backend roles probe deeper—indexing strategies, query optimization, transaction isolation, and operational knowledge separate strong candidates from the rest.
This guide goes beyond JOINs to cover the advanced database concepts that interviewers actually ask about.
1. PostgreSQL vs MySQL
Understanding when to choose each database demonstrates architectural thinking.
Architecture Comparison
What are the key architectural differences between PostgreSQL and MySQL?
| Aspect | PostgreSQL | MySQL (InnoDB) |
|---|---|---|
| Process model | Process per connection | Thread per connection |
| MVCC implementation | Stores versions in main table | Stores versions in undo logs |
| Replication | Logical + streaming | Binary log replication |
| Default isolation | READ COMMITTED | REPEATABLE READ |
| JSON support | JSONB with GIN indexing | JSON with limited indexing |
| Full-text search | Built-in, configurable | Built-in, simpler |
| Extensions | Rich ecosystem (PostGIS, etc.) | Limited |
When to Choose Each
When would you choose PostgreSQL over MySQL?
Choose PostgreSQL when you need:
- Complex queries with CTEs, window functions, lateral joins
- JSONB with indexing for semi-structured data
- Geospatial data (PostGIS)
- Strong SQL standards compliance
- Custom types, functions, and extensions
- Advanced indexing (partial, expression, GIN/GiST)
-- PostgreSQL strengths: Window functions with complex logic
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total,
amount - LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) as change_from_previous
FROM orders;
-- JSONB querying with indexing
CREATE INDEX idx_orders_metadata ON orders USING GIN (metadata jsonb_path_ops);
SELECT * FROM orders WHERE metadata @> '{"priority": "high"}';Choose MySQL when you need:
- Simpler operational model
- Broader hosting/cloud support
- Read-heavy workloads with read replicas
- Simpler replication setup
- Large ecosystem of tools and ORMs
-- MySQL strengths: Simple read replica setup
-- Primary
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='primary.example.com',
SOURCE_USER='repl',
SOURCE_PASSWORD='password',
SOURCE_AUTO_POSITION=1;
START REPLICA;2. Indexing Strategies
Indexing is the most impactful performance optimization—and a frequent interview topic.
B-Tree Index Fundamentals
How does a B-tree index work?
B-Tree Index Structure (simplified):
┌─────────────┐
│ [M] │ Root
└──────┬──────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ [D, H] │ │ [P, T] │ │ [X] │ Internal
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
┌────┬───┴───┬────┐ ... ...
▼ ▼ ▼ ▼
┌─────┐┌─────┐┌─────┐┌─────┐
│A,B,C││E,F,G││I,J,K││L │ Leaf nodes (contain row pointers)
└─────┘└─────┘└─────┘└─────┘
Key properties:
- O(log n) lookup time vs O(n) for full scan
- Sorted order enables range queries and ORDER BY
- Leaf nodes are linked for efficient range scans
- Works for:
=,<,>,<=,>=,BETWEEN,LIKE 'prefix%'
What is a composite index and how does column order matter?
-- Composite index on (country, city, created_at)
CREATE INDEX idx_location_date ON users(country, city, created_at);
-- Uses full index (all 3 columns)
SELECT * FROM users
WHERE country = 'US' AND city = 'NYC' AND created_at > '2024-01-01';
-- Uses first 2 columns
SELECT * FROM users WHERE country = 'US' AND city = 'NYC';
-- Uses first column only
SELECT * FROM users WHERE country = 'US';
-- CANNOT use index efficiently (skips first column)
SELECT * FROM users WHERE city = 'NYC'; -- Full scan!
-- CANNOT use index efficiently (range on non-last column)
SELECT * FROM users WHERE country = 'US' AND created_at > '2024-01-01';
-- Only uses 'country', then scansIndex column order rule: Place columns in order of:
- Equality conditions first (
=) - Range conditions last (
<,>,BETWEEN) - Higher cardinality before lower (usually)
Advanced Index Types
What index types are available beyond B-tree?
-- PostgreSQL index types
-- Hash index: Only equality, faster for = but not <, >
CREATE INDEX idx_email_hash ON users USING HASH (email);
-- GIN (Generalized Inverted Index): Arrays, JSONB, full-text
CREATE INDEX idx_tags ON articles USING GIN (tags);
CREATE INDEX idx_metadata ON products USING GIN (metadata jsonb_path_ops);
CREATE INDEX idx_search ON documents USING GIN (to_tsvector('english', content));
-- GiST (Generalized Search Tree): Geometric, range types, full-text
CREATE INDEX idx_location ON places USING GIST (coordinates);
CREATE INDEX idx_schedule ON events USING GIST (time_range);
-- BRIN (Block Range Index): Large sequential data, minimal storage
CREATE INDEX idx_created ON logs USING BRIN (created_at);| Index Type | Use Case | Size | Query Types |
|---|---|---|---|
| B-tree | General purpose | Medium | =, <, >, range, ORDER BY |
| Hash | Equality only | Small | = only |
| GIN | Arrays, JSONB, text | Large | Contains, overlap |
| GiST | Geometric, ranges | Medium | Spatial, range overlap |
| BRIN | Time-series, sequential | Tiny | Range on sorted data |
Covering and Partial Indexes
What is a covering index (index-only scan)?
-- Regular index: Must fetch row from table
CREATE INDEX idx_email ON users(email);
SELECT name FROM users WHERE email = 'test@example.com';
-- Index lookup → Row fetch → Return name
-- Covering index: All data in index
CREATE INDEX idx_email_name ON users(email) INCLUDE (name);
SELECT name FROM users WHERE email = 'test@example.com';
-- Index lookup → Return name (no table access!)
-- PostgreSQL INCLUDE syntax (non-key columns)
CREATE INDEX idx_order_lookup ON orders(customer_id)
INCLUDE (order_date, total);What is a partial index and when would you use it?
-- Index only active users (much smaller than full index)
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(customer_id, created_at)
WHERE created_at > '2024-01-01';
-- Index only non-null values
CREATE INDEX idx_phone ON users(phone)
WHERE phone IS NOT NULL;
-- Query must match WHERE clause to use partial index
SELECT * FROM users WHERE email = 'test@example.com' AND status = 'active';3. Query Optimization
Reading execution plans is essential for diagnosing performance issues.
EXPLAIN ANALYZE
How do you read an execution plan?
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = 'US'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;Limit (cost=1250.23..1250.26 rows=10 width=48) (actual time=45.2..45.3 rows=10 loops=1)
-> Sort (cost=1250.23..1256.73 rows=2600 width=48) (actual time=45.2..45.2 rows=10 loops=1)
Sort Key: (count(o.id)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=1150.00..1189.00 rows=2600 width=48) (actual time=42.1..44.8 rows=2600 loops=1)
Group Key: u.id
-> Hash Right Join (cost=85.50..1020.00 rows=26000 width=44) (actual time=1.2..28.5 rows=26000 loops=1)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (cost=0.00..620.00 rows=26000 width=8) (actual time=0.01..8.2 rows=26000 loops=1)
-> Hash (cost=73.00..73.00 rows=1000 width=40) (actual time=1.1..1.1 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 72kB
-> Seq Scan on users u (cost=0.00..73.00 rows=1000 width=40) (actual time=0.02..0.8 rows=1000 loops=1)
Filter: (country = 'US'::text)
Rows Removed by Filter: 9000
Planning Time: 0.5 ms
Execution Time: 45.5 ms
Key things to look for:
- Seq Scan on large tables → Consider adding index
- Nested Loop with high row counts → May need different join type
- actual rows >> rows → Statistics are stale, run ANALYZE
- Sort Method: external merge → Not enough work_mem
Common Anti-Patterns
What query patterns prevent index usage?
-- Anti-pattern 1: Function on indexed column
SELECT * FROM users WHERE LOWER(email) = 'test@example.com'; -- No index!
-- Fix: Expression index
CREATE INDEX idx_email_lower ON users(LOWER(email));
-- Anti-pattern 2: Implicit type conversion
SELECT * FROM orders WHERE order_id = '12345'; -- order_id is INT
-- Fix: Use correct type
SELECT * FROM orders WHERE order_id = 12345;
-- Anti-pattern 3: Leading wildcard
SELECT * FROM users WHERE name LIKE '%smith'; -- Full scan!
-- Fix: Full-text search or trigram index
CREATE INDEX idx_name_trgm ON users USING GIN (name gin_trgm_ops);
-- Anti-pattern 4: OR on different columns
SELECT * FROM users WHERE email = 'a@b.com' OR phone = '555-1234';
-- Fix: UNION or separate indexes with bitmap scan
SELECT * FROM users WHERE email = 'a@b.com'
UNION
SELECT * FROM users WHERE phone = '555-1234';
-- Anti-pattern 5: NOT IN with NULLs
SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM inactive_customers);
-- Fix: NOT EXISTS or LEFT JOIN IS NULL
SELECT o.* FROM orders o
LEFT JOIN inactive_customers ic ON o.customer_id = ic.id
WHERE ic.id IS NULL;N+1 Query Problem
What is the N+1 query problem in SQL?
-- N+1 pattern in application code (pseudocode):
users = query("SELECT * FROM users WHERE country = 'US'") -- 1 query
for user in users:
orders = query("SELECT * FROM orders WHERE user_id = ?", user.id) -- N queries
-- Total: 1 + N queries (if 1000 users, that's 1001 queries!)
-- Solution: JOIN or subquery
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = 'US'; -- 1 query
-- Or with aggregation
SELECT
u.*,
COALESCE(order_stats.count, 0) as order_count,
COALESCE(order_stats.total, 0) as order_total
FROM users u
LEFT JOIN (
SELECT user_id, COUNT(*) as count, SUM(amount) as total
FROM orders
GROUP BY user_id
) order_stats ON order_stats.user_id = u.id
WHERE u.country = 'US';4. Transactions & Locking
Understanding transactions is critical for data integrity and concurrency.
ACID Properties
Explain ACID properties with examples.
- Atomicity: All or nothing
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- If either fails, both are rolled back
COMMIT;- Consistency: Database moves from one valid state to another
-- Constraints ensure consistency
ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);
-- Transfer that would make balance negative will fail- Isolation: Concurrent transactions don't interfere
-- Transaction A reads balance, Transaction B updates it
-- Isolation level determines what A sees- Durability: Committed changes survive failures
COMMIT; -- After this, data is safe even if server crashesIsolation Levels
What are the four isolation levels and their trade-offs?
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| READ UNCOMMITTED | Yes | Yes | Yes | Fastest |
| READ COMMITTED | No | Yes | Yes | Fast |
| REPEATABLE READ | No | No | Yes* | Medium |
| SERIALIZABLE | No | No | No | Slowest |
*PostgreSQL's REPEATABLE READ also prevents phantom reads via MVCC.
-- Set isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
-- All reads in this transaction see consistent snapshot
SELECT balance FROM accounts WHERE id = 1; -- Returns 1000
-- Even if another transaction commits a change...
SELECT balance FROM accounts WHERE id = 1; -- Still returns 1000
COMMIT;
-- Serializable: Full isolation, may fail with serialization error
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT SUM(balance) FROM accounts; -- Locks prevent concurrent writes
UPDATE accounts SET balance = balance + 100 WHERE id = 1;
COMMIT; -- May fail: "could not serialize access"MVCC Deep Dive
How does PostgreSQL implement MVCC?
Row Version Structure:
┌──────────┬──────────┬─────────────────────────────┐
│ xmin │ xmax │ Row Data │
│ (100) │ (0) │ id=1, name='Alice' │ Created by txn 100
└──────────┴──────────┴─────────────────────────────┘
After UPDATE by transaction 150:
┌──────────┬──────────┬─────────────────────────────┐
│ xmin │ xmax │ Row Data │
│ (100) │ (150) │ id=1, name='Alice' │ Old version (dead)
└──────────┴──────────┴─────────────────────────────┘
┌──────────┬──────────┬─────────────────────────────┐
│ xmin │ xmax │ Row Data │
│ (150) │ (0) │ id=1, name='Alice Smith' │ New version
└──────────┴──────────┴─────────────────────────────┘
Visibility rules:
- Row visible if:
xmin committed AND (xmax = 0 OR xmax not committed OR xmax > my_txn_id) - Each transaction sees a consistent snapshot
- Dead rows cleaned up by VACUUM
-- Check transaction ID
SELECT txid_current();
-- See dead rows (before vacuum)
SELECT * FROM pg_stat_user_tables WHERE relname = 'accounts';
-- n_dead_tup shows dead rows needing vacuumLocking Strategies
What's the difference between optimistic and pessimistic locking?
-- Pessimistic locking: Lock row immediately
BEGIN;
SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE;
-- Row is locked, other transactions wait
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 1;
COMMIT;
-- FOR UPDATE variants:
SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE NOWAIT; -- Fail immediately if locked
SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE SKIP LOCKED; -- Skip locked rows
-- Optimistic locking: Check version at update time
-- Application maintains version column
UPDATE inventory
SET quantity = quantity - 1, version = version + 1
WHERE product_id = 1 AND version = 5;
-- If 0 rows affected, someone else modified it → retry
-- With timestamp
UPDATE inventory
SET quantity = quantity - 1, updated_at = NOW()
WHERE product_id = 1 AND updated_at = '2024-01-15 10:30:00';| Approach | Pros | Cons | Use When |
|---|---|---|---|
| Pessimistic | Guaranteed consistency | Blocking, potential deadlocks | High contention, short transactions |
| Optimistic | No blocking, better throughput | Retry logic needed | Low contention, read-heavy |
5. Advanced Data Types
Modern databases offer powerful data types beyond basic scalars.
JSON and JSONB
When should you use JSON vs JSONB in PostgreSQL?
-- JSONB: Binary storage, most common choice
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
attributes JSONB -- Flexible schema for product attributes
);
-- Insert JSONB
INSERT INTO products (name, attributes) VALUES
('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "storage": "512GB"}}');
-- Query JSONB
SELECT * FROM products WHERE attributes->>'brand' = 'Dell';
SELECT * FROM products WHERE attributes @> '{"brand": "Dell"}';
SELECT * FROM products WHERE attributes->'specs'->>'ram' = '16';
-- JSONB operators
-- -> : Get JSON element (returns JSON)
-- ->> : Get JSON element as text
-- @> : Contains
-- ? : Key exists
-- ?| : Any key exists
-- ?& : All keys exist
-- Index JSONB for fast queries
CREATE INDEX idx_attributes ON products USING GIN (attributes);
CREATE INDEX idx_brand ON products USING GIN ((attributes->'brand'));
-- JSONB vs JSON
-- JSON: Preserves whitespace, key order, duplicates. Parsed on each access.
-- JSONB: Binary, deduplicated, indexable. Faster queries, slightly slower insert.How do you query nested JSONB efficiently?
-- Path queries with jsonb_path_query
SELECT * FROM products
WHERE jsonb_path_exists(attributes, '$.specs.ram ? (@ > 8)');
-- Aggregate JSON
SELECT
attributes->>'brand' as brand,
COUNT(*),
AVG((attributes->'specs'->>'ram')::int) as avg_ram
FROM products
GROUP BY attributes->>'brand';
-- Update nested JSONB
UPDATE products
SET attributes = jsonb_set(attributes, '{specs,ram}', '32')
WHERE id = 1;
-- Add key
UPDATE products
SET attributes = attributes || '{"warranty": "2 years"}'
WHERE id = 1;
-- Remove key
UPDATE products
SET attributes = attributes - 'warranty'
WHERE id = 1;Arrays
-- Array column
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
tags TEXT[]
);
INSERT INTO articles (title, tags) VALUES
('PostgreSQL Tips', ARRAY['database', 'postgresql', 'performance']);
-- Query arrays
SELECT * FROM articles WHERE 'postgresql' = ANY(tags);
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'database'];
SELECT * FROM articles WHERE tags && ARRAY['mysql', 'postgresql']; -- Overlap
-- Array functions
SELECT array_length(tags, 1) FROM articles;
SELECT unnest(tags) FROM articles; -- Expand to rows
-- Index for array queries
CREATE INDEX idx_tags ON articles USING GIN (tags);Full-Text Search
-- Create text search column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
to_tsvector('english', title || ' ' || COALESCE(content, ''));
CREATE INDEX idx_search ON articles USING GIN (search_vector);
-- Search with ranking
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Phrase search
SELECT * FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'database performance');
-- Auto-update with trigger
CREATE TRIGGER update_search_vector
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, content);6. Replication & High Availability
Production databases need redundancy and failover capabilities.
PostgreSQL Streaming Replication
Primary-Replica Architecture:
┌─────────────────┐ ┌─────────────────┐
│ Primary │ │ Replica │
│ │ WAL │ │
│ Read/Write │────────▶│ Read Only │
│ │ Stream │ │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ App │ │ App │
│ (Write) │ │ (Read) │
└─────────┘ └─────────┘
-- Primary configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
synchronous_commit = on -- or 'remote_apply' for sync replica
-- Replica setup
-- pg_basebackup -h primary -D /var/lib/postgresql/data -U replicator -P
-- standby.signal file indicates replica mode
-- primary_conninfo in postgresql.auto.conf
-- Check replication status (on primary)
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;
-- Check replication lag
SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_lag;MySQL Replication
-- Source (primary) configuration
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
-- Replica configuration
[mysqld]
server-id = 2
relay_log = relay-bin
read_only = ON
gtid_mode = ON
enforce_gtid_consistency = ON
-- Set up replication on replica
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'primary.example.com',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'password',
SOURCE_AUTO_POSITION = 1;
START REPLICA;
-- Check status
SHOW REPLICA STATUS\GConnection Pooling
Why is connection pooling important?
Without pooling:
App Server Database
┌─────────┐ 100 requests ┌─────────┐
│ │───────────────────│ 100 │
│ App │ 100 connections│ conns │ Memory exhaustion!
│ │───────────────────│ │
└─────────┘ └─────────┘
With pooling (PgBouncer):
App Server Pooler Database
┌─────────┐ ┌─────────┐ ┌─────────┐
│ │ │ │ │ │
│ App │────│PgBouncer│───────│ 20 │
│ │100 │ pool │ 20 │ conns │
└─────────┘ └─────────┘ └─────────┘
# PgBouncer configuration
[databases]
mydb = host=localhost dbname=mydb
[pgbouncer]
pool_mode = transaction # or session, statement
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
# Pool modes:
# session: Connection held until client disconnects
# transaction: Connection returned after each transaction (most common)
# statement: Connection returned after each statement (no transactions)7. Performance Tuning
Proper configuration significantly impacts database performance.
Key Configuration Parameters
PostgreSQL tuning:
# Memory
shared_buffers = 4GB # 25% of RAM for dedicated server
effective_cache_size = 12GB # 75% of RAM (estimate of OS cache)
work_mem = 256MB # Per-operation memory (sorts, hashes)
maintenance_work_mem = 1GB # For VACUUM, CREATE INDEX
# WAL
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
# Planner
random_page_cost = 1.1 # Lower for SSD (default 4.0)
effective_io_concurrency = 200 # For SSD
# Connections
max_connections = 200 # Use connection pooling for moreMySQL/InnoDB tuning:
# Buffer pool (most important)
innodb_buffer_pool_size = 12G # 70-80% of RAM
innodb_buffer_pool_instances = 8 # 1 per GB up to 8
# Logging
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1 # 1=durable, 2=faster
# I/O
innodb_io_capacity = 2000 # SSD
innodb_io_capacity_max = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8Table Partitioning
When and how should you partition tables?
-- PostgreSQL declarative partitioning
CREATE TABLE orders (
id BIGSERIAL,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
-- Create partitions
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- Queries automatically route to correct partition
SELECT * FROM orders WHERE order_date = '2024-02-15';
-- Only scans orders_2024_q1
-- Partition pruning
EXPLAIN SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
-- Shows only orders_2024_q1 scannedPartition strategies:
- Range: Date, numeric ranges (most common)
- List: Discrete values (country, status)
- Hash: Even distribution (no natural range)
VACUUM and ANALYZE
Why is VACUUM important in PostgreSQL?
-- VACUUM: Reclaims dead row space
VACUUM orders; -- Standard vacuum
VACUUM FULL orders; -- Rewrites table, locks it
VACUUM ANALYZE orders; -- Vacuum + update statistics
-- Check vacuum status
SELECT relname, n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables;
-- ANALYZE: Updates query planner statistics
ANALYZE orders;
-- Auto-vacuum settings
-- autovacuum_vacuum_threshold = 50
-- autovacuum_vacuum_scale_factor = 0.2
-- Vacuum when: dead_tuples > threshold + scale_factor * table_size
-- For high-write tables, tune per-table
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02
);Quick Reference: Common Interview Questions
| Topic | Key Points |
|---|---|
| PostgreSQL vs MySQL | Features, MVCC implementation, use cases |
| B-tree index | O(log n), sorted, composite column order matters |
| Index types | B-tree, Hash, GIN, GiST, BRIN - know when to use each |
| EXPLAIN ANALYZE | Read plans, identify seq scans, check row estimates |
| Isolation levels | RC, RR, Serializable - trade-offs |
| MVCC | xmin/xmax, snapshots, VACUUM |
| Locking | Optimistic vs pessimistic, FOR UPDATE |
| JSONB | Operators (@>, ?), GIN indexing |
| Replication | Streaming, sync vs async, lag monitoring |
| Tuning | shared_buffers, work_mem, connection pooling |
Related Articles
- SQL Joins Interview Guide - JOIN fundamentals
- System Design Interview Guide - Database architecture patterns
- Hibernate & JPA Interview Guide - ORM integration
- Complete Java Backend Developer Interview Guide - Full Java backend preparation
