When should you choose PostgreSQL over MySQL?

Choose PostgreSQL when you need advanced features like JSONB with indexing, full-text search, window functions, CTEs, array types, or PostGIS for spatial data. PostgreSQL offers stronger SQL standards compliance and better support for complex queries. Choose MySQL when you need simpler replication setup, broader hosting support, or when working with read-heavy workloads. MySQL's ecosystem is larger for web applications and has better tooling for some use cases.

How do database indexes work and when should you create them?

B-tree indexes store sorted key values in a balanced tree structure, enabling O(log n) lookups instead of O(n) full table scans. Create indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY. However, indexes slow down writes and consume storage. Avoid indexing low-cardinality columns, rarely-queried columns, or small tables. Use composite indexes for multi-column queries, placing high-cardinality columns first.

What are database isolation levels and which should you use?

The four isolation levels are: READ UNCOMMITTED (dirty reads allowed), READ COMMITTED (sees only committed data, PostgreSQL default), REPEATABLE READ (consistent reads within transaction, MySQL InnoDB default), and SERIALIZABLE (full isolation, slowest). Most applications use READ COMMITTED for balance of consistency and performance. Use REPEATABLE READ or SERIALIZABLE for financial transactions or when phantom reads are unacceptable.

How does MVCC prevent locking in PostgreSQL?

Multi-Version Concurrency Control (MVCC) maintains multiple versions of each row with transaction visibility information (xmin/xmax). Readers see a consistent snapshot without blocking writers, and writers don't block readers. Each transaction sees data as it existed at transaction start. Old row versions are cleaned up by VACUUM. This enables high concurrency but requires regular maintenance to prevent table bloat.

What is the difference between JSON and JSONB in PostgreSQL?

JSON stores data as text, preserving formatting and duplicate keys, with slower processing since it's parsed on each access. JSONB stores data in binary format, removing whitespace and duplicates, enabling indexing with GIN indexes and faster querying. Use JSONB for most cases as it supports operators like @>, ?, and ?| for efficient querying. Use JSON only when you need to preserve exact input format or key ordering.

How do you identify and fix slow database queries?

Use EXPLAIN ANALYZE to see the actual execution plan with timing. Look for sequential scans on large tables (add indexes), nested loops with high row counts (consider JOIN order), and high actual vs estimated rows (update statistics with ANALYZE). Enable slow query logging to find problematic queries. Common fixes include adding appropriate indexes, rewriting queries to be sargable, breaking up complex queries, and ensuring statistics are current.

30+ PostgreSQL & MySQL Interview Questions 2025: Indexing, MVCC & Query Optimization

Relational databases remain the backbone of most applications. While basic SQL is table stakes, interviews for senior backend roles probe deeper—indexing strategies, query optimization, transaction isolation, and operational knowledge separate strong candidates from the rest.

PostgreSQL vs MySQL Questions
Indexing Strategy Questions
Query Optimization Questions
Transaction and Locking Questions
Advanced Data Types Questions
Replication and High Availability Questions
Performance Tuning Questions
Quick Reference

PostgreSQL vs MySQL Questions

These questions test your understanding of when to choose each database and their architectural differences.

What are the key architectural differences between PostgreSQL and MySQL?

Aspect	PostgreSQL	MySQL (InnoDB)
Process model	Process per connection	Thread per connection
MVCC implementation	Stores versions in main table	Stores versions in undo logs
Replication	Logical + streaming	Binary log replication
Default isolation	READ COMMITTED	REPEATABLE READ
JSON support	JSONB with GIN indexing	JSON with limited indexing
Full-text search	Built-in, configurable	Built-in, simpler
Extensions	Rich ecosystem (PostGIS, etc.)	Limited

When would you choose PostgreSQL over MySQL?

Choose PostgreSQL when you need:

Complex queries with CTEs, window functions, lateral joins
JSONB with indexing for semi-structured data
Geospatial data (PostGIS)
Strong SQL standards compliance
Custom types, functions, and extensions
Advanced indexing (partial, expression, GIN/GiST)

-- PostgreSQL strengths: Window functions with complex logic
SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total,
    amount - LAG(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) as change_from_previous
FROM orders;
 
-- JSONB querying with indexing
CREATE INDEX idx_orders_metadata ON orders USING GIN (metadata jsonb_path_ops);
SELECT * FROM orders WHERE metadata @> '{"priority": "high"}';

When would you choose MySQL over PostgreSQL?

Choose MySQL when you need:

Simpler operational model
Broader hosting/cloud support
Read-heavy workloads with read replicas
Simpler replication setup
Large ecosystem of tools and ORMs

-- MySQL strengths: Simple read replica setup
-- Primary
CHANGE REPLICATION SOURCE TO
    SOURCE_HOST='primary.example.com',
    SOURCE_USER='repl',
    SOURCE_PASSWORD='password',
    SOURCE_AUTO_POSITION=1;
START REPLICA;

Indexing Strategy Questions

These questions test your understanding of database indexing—the most impactful performance optimization.

How does a B-tree index work?

flowchart TB
    Root["[M]<br/>Root"]
 
    Root --> I1["[D, H]<br/>Internal"]
    Root --> I2["[P, T]<br/>Internal"]
    Root --> I3["[X]<br/>Internal"]
 
    I1 --> L1["A,B,C<br/>Leaf"]
    I1 --> L2["E,F,G<br/>Leaf"]
    I1 --> L3["I,J,K<br/>Leaf"]
    I1 --> L4["L<br/>Leaf"]

Key properties:

O(log n) lookup time vs O(n) for full scan
Sorted order enables range queries and ORDER BY
Leaf nodes are linked for efficient range scans
Works for: =, <, >, <=, >=, BETWEEN, LIKE 'prefix%'

What is a composite index and how does column order matter?

-- Composite index on (country, city, created_at)
CREATE INDEX idx_location_date ON users(country, city, created_at);
 
-- Uses full index (all 3 columns)
SELECT * FROM users
WHERE country = 'US' AND city = 'NYC' AND created_at > '2024-01-01';
 
-- Uses first 2 columns
SELECT * FROM users WHERE country = 'US' AND city = 'NYC';
 
-- Uses first column only
SELECT * FROM users WHERE country = 'US';
 
-- CANNOT use index efficiently (skips first column)
SELECT * FROM users WHERE city = 'NYC';  -- Full scan!
 
-- CANNOT use index efficiently (range on non-last column)
SELECT * FROM users WHERE country = 'US' AND created_at > '2024-01-01';
-- Only uses 'country', then scans

Index column order rule: Place columns in order of:

Equality conditions first (=)
Range conditions last (<, >, BETWEEN)
Higher cardinality before lower (usually)

What index types are available beyond B-tree?

-- PostgreSQL index types
 
-- Hash index: Only equality, faster for = but not <, >
CREATE INDEX idx_email_hash ON users USING HASH (email);
 
-- GIN (Generalized Inverted Index): Arrays, JSONB, full-text
CREATE INDEX idx_tags ON articles USING GIN (tags);
CREATE INDEX idx_metadata ON products USING GIN (metadata jsonb_path_ops);
CREATE INDEX idx_search ON documents USING GIN (to_tsvector('english', content));
 
-- GiST (Generalized Search Tree): Geometric, range types, full-text
CREATE INDEX idx_location ON places USING GIST (coordinates);
CREATE INDEX idx_schedule ON events USING GIST (time_range);
 
-- BRIN (Block Range Index): Large sequential data, minimal storage
CREATE INDEX idx_created ON logs USING BRIN (created_at);

Index Type	Use Case	Size	Query Types
B-tree	General purpose	Medium	=, <, >, range, ORDER BY
Hash	Equality only	Small	= only
GIN	Arrays, JSONB, text	Large	Contains, overlap
GiST	Geometric, ranges	Medium	Spatial, range overlap
BRIN	Time-series, sequential	Tiny	Range on sorted data

What is a covering index and how does it enable index-only scans?

-- Regular index: Must fetch row from table
CREATE INDEX idx_email ON users(email);
SELECT name FROM users WHERE email = 'test@example.com';
-- Index lookup → Row fetch → Return name
 
-- Covering index: All data in index
CREATE INDEX idx_email_name ON users(email) INCLUDE (name);
SELECT name FROM users WHERE email = 'test@example.com';
-- Index lookup → Return name (no table access!)
 
-- PostgreSQL INCLUDE syntax (non-key columns)
CREATE INDEX idx_order_lookup ON orders(customer_id)
    INCLUDE (order_date, total);

What is a partial index and when would you use it?

-- Index only active users (much smaller than full index)
CREATE INDEX idx_active_users ON users(email)
    WHERE status = 'active';
 
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(customer_id, created_at)
    WHERE created_at > '2024-01-01';
 
-- Index only non-null values
CREATE INDEX idx_phone ON users(phone)
    WHERE phone IS NOT NULL;
 
-- Query must match WHERE clause to use partial index
SELECT * FROM users WHERE email = 'test@example.com' AND status = 'active';

Query Optimization Questions

These questions test your ability to read execution plans and diagnose performance issues.

How do you read and interpret an execution plan?

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = 'US'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

Limit  (cost=1250.23..1250.26 rows=10 width=48) (actual time=45.2..45.3 rows=10 loops=1)
  ->  Sort  (cost=1250.23..1256.73 rows=2600 width=48) (actual time=45.2..45.2 rows=10 loops=1)
        Sort Key: (count(o.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  HashAggregate  (cost=1150.00..1189.00 rows=2600 width=48) (actual time=42.1..44.8 rows=2600 loops=1)
              Group Key: u.id
              ->  Hash Right Join  (cost=85.50..1020.00 rows=26000 width=44) (actual time=1.2..28.5 rows=26000 loops=1)
                    Hash Cond: (o.user_id = u.id)
                    ->  Seq Scan on orders o  (cost=0.00..620.00 rows=26000 width=8) (actual time=0.01..8.2 rows=26000 loops=1)
                    ->  Hash  (cost=73.00..73.00 rows=1000 width=40) (actual time=1.1..1.1 rows=1000 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 72kB
                          ->  Seq Scan on users u  (cost=0.00..73.00 rows=1000 width=40) (actual time=0.02..0.8 rows=1000 loops=1)
                                Filter: (country = 'US'::text)
                                Rows Removed by Filter: 9000
Planning Time: 0.5 ms
Execution Time: 45.5 ms

Key things to look for:

Seq Scan on large tables → Consider adding index
Nested Loop with high row counts → May need different join type
actual rows >> rows → Statistics are stale, run ANALYZE
Sort Method: external merge → Not enough work_mem

What query patterns prevent index usage?

-- Anti-pattern 1: Function on indexed column
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';  -- No index!
-- Fix: Expression index
CREATE INDEX idx_email_lower ON users(LOWER(email));
 
-- Anti-pattern 2: Implicit type conversion
SELECT * FROM orders WHERE order_id = '12345';  -- order_id is INT
-- Fix: Use correct type
SELECT * FROM orders WHERE order_id = 12345;
 
-- Anti-pattern 3: Leading wildcard
SELECT * FROM users WHERE name LIKE '%smith';  -- Full scan!
-- Fix: Full-text search or trigram index
CREATE INDEX idx_name_trgm ON users USING GIN (name gin_trgm_ops);
 
-- Anti-pattern 4: OR on different columns
SELECT * FROM users WHERE email = 'a@b.com' OR phone = '555-1234';
-- Fix: UNION or separate indexes with bitmap scan
SELECT * FROM users WHERE email = 'a@b.com'
UNION
SELECT * FROM users WHERE phone = '555-1234';
 
-- Anti-pattern 5: NOT IN with NULLs
SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM inactive_customers);
-- Fix: NOT EXISTS or LEFT JOIN IS NULL
SELECT o.* FROM orders o
LEFT JOIN inactive_customers ic ON o.customer_id = ic.id
WHERE ic.id IS NULL;

What is the N+1 query problem and how do you solve it?

-- N+1 pattern in application code (pseudocode):
users = query("SELECT * FROM users WHERE country = 'US'")  -- 1 query
for user in users:
    orders = query("SELECT * FROM orders WHERE user_id = ?", user.id)  -- N queries
 
-- Total: 1 + N queries (if 1000 users, that's 1001 queries!)
 
-- Solution: JOIN or subquery
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = 'US';  -- 1 query
 
-- Or with aggregation
SELECT
    u.*,
    COALESCE(order_stats.count, 0) as order_count,
    COALESCE(order_stats.total, 0) as order_total
FROM users u
LEFT JOIN (
    SELECT user_id, COUNT(*) as count, SUM(amount) as total
    FROM orders
    GROUP BY user_id
) order_stats ON order_stats.user_id = u.id
WHERE u.country = 'US';

Transaction and Locking Questions

These questions test your understanding of transactions, data integrity, and concurrency.

What are ACID properties and why do they matter?

Atomicity: All or nothing

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- If either fails, both are rolled back
COMMIT;

Consistency: Database moves from one valid state to another

-- Constraints ensure consistency
ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);
-- Transfer that would make balance negative will fail

Isolation: Concurrent transactions don't interfere

-- Transaction A reads balance, Transaction B updates it
-- Isolation level determines what A sees

Durability: Committed changes survive failures

COMMIT;  -- After this, data is safe even if server crashes

What are the four isolation levels and their trade-offs?

Level	Dirty Read	Non-Repeatable Read	Phantom Read	Performance
READ UNCOMMITTED	Yes	Yes	Yes	Fastest
READ COMMITTED	No	Yes	Yes	Fast
REPEATABLE READ	No	No	Yes*	Medium
SERIALIZABLE	No	No	No	Slowest

*PostgreSQL's REPEATABLE READ also prevents phantom reads via MVCC.

-- Set isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
-- All reads in this transaction see consistent snapshot
SELECT balance FROM accounts WHERE id = 1;  -- Returns 1000
-- Even if another transaction commits a change...
SELECT balance FROM accounts WHERE id = 1;  -- Still returns 1000
COMMIT;
 
-- Serializable: Full isolation, may fail with serialization error
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT SUM(balance) FROM accounts;  -- Locks prevent concurrent writes
UPDATE accounts SET balance = balance + 100 WHERE id = 1;
COMMIT;  -- May fail: "could not serialize access"

How does PostgreSQL implement MVCC?

flowchart TB
    subgraph INITIAL["Row Version Structure"]
        R1["xmin: 100 | xmax: 0<br/>id=1, name='Alice'<br/><i>Created by txn 100</i>"]
    end
 
    subgraph AFTER["After UPDATE by transaction 150"]
        R2["xmin: 100 | xmax: 150<br/>id=1, name='Alice'<br/><i>Old version (dead)</i>"]
        R3["xmin: 150 | xmax: 0<br/>id=1, name='Alice Smith'<br/><i>New version</i>"]
    end
 
    INITIAL -->|"UPDATE"| AFTER

Visibility rules:

Row visible if: xmin committed AND (xmax = 0 OR xmax not committed OR xmax > my_txn_id)
Each transaction sees a consistent snapshot
Dead rows cleaned up by VACUUM

-- Check transaction ID
SELECT txid_current();
 
-- See dead rows (before vacuum)
SELECT * FROM pg_stat_user_tables WHERE relname = 'accounts';
-- n_dead_tup shows dead rows needing vacuum

What is the difference between optimistic and pessimistic locking?

-- Pessimistic locking: Lock row immediately
BEGIN;
SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE;
-- Row is locked, other transactions wait
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 1;
COMMIT;
 
-- FOR UPDATE variants:
SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE NOWAIT;  -- Fail immediately if locked
SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE SKIP LOCKED;  -- Skip locked rows
 
-- Optimistic locking: Check version at update time
-- Application maintains version column
UPDATE inventory
SET quantity = quantity - 1, version = version + 1
WHERE product_id = 1 AND version = 5;
-- If 0 rows affected, someone else modified it → retry
 
-- With timestamp
UPDATE inventory
SET quantity = quantity - 1, updated_at = NOW()
WHERE product_id = 1 AND updated_at = '2024-01-15 10:30:00';

Approach	Pros	Cons	Use When
Pessimistic	Guaranteed consistency	Blocking, potential deadlocks	High contention, short transactions
Optimistic	No blocking, better throughput	Retry logic needed	Low contention, read-heavy

Advanced Data Types Questions

These questions test your knowledge of modern database types beyond basic scalars.

When should you use JSON vs JSONB in PostgreSQL?

-- JSONB: Binary storage, most common choice
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    attributes JSONB  -- Flexible schema for product attributes
);
 
-- Insert JSONB
INSERT INTO products (name, attributes) VALUES
('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "storage": "512GB"}}');
 
-- Query JSONB
SELECT * FROM products WHERE attributes->>'brand' = 'Dell';
SELECT * FROM products WHERE attributes @> '{"brand": "Dell"}';
SELECT * FROM products WHERE attributes->'specs'->>'ram' = '16';
 
-- JSONB operators
-- -> : Get JSON element (returns JSON)
-- ->> : Get JSON element as text
-- @> : Contains
-- ? : Key exists
-- ?| : Any key exists
-- ?& : All keys exist
 
-- Index JSONB for fast queries
CREATE INDEX idx_attributes ON products USING GIN (attributes);
CREATE INDEX idx_brand ON products USING GIN ((attributes->'brand'));
 
-- JSONB vs JSON
-- JSON: Preserves whitespace, key order, duplicates. Parsed on each access.
-- JSONB: Binary, deduplicated, indexable. Faster queries, slightly slower insert.

How do you query nested JSONB efficiently?

-- Path queries with jsonb_path_query
SELECT * FROM products
WHERE jsonb_path_exists(attributes, '$.specs.ram ? (@ > 8)');
 
-- Aggregate JSON
SELECT
    attributes->>'brand' as brand,
    COUNT(*),
    AVG((attributes->'specs'->>'ram')::int) as avg_ram
FROM products
GROUP BY attributes->>'brand';
 
-- Update nested JSONB
UPDATE products
SET attributes = jsonb_set(attributes, '{specs,ram}', '32')
WHERE id = 1;
 
-- Add key
UPDATE products
SET attributes = attributes || '{"warranty": "2 years"}'
WHERE id = 1;
 
-- Remove key
UPDATE products
SET attributes = attributes - 'warranty'
WHERE id = 1;

How do you work with array columns in PostgreSQL?

-- Array column
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    tags TEXT[]
);
 
INSERT INTO articles (title, tags) VALUES
('PostgreSQL Tips', ARRAY['database', 'postgresql', 'performance']);
 
-- Query arrays
SELECT * FROM articles WHERE 'postgresql' = ANY(tags);
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'database'];
SELECT * FROM articles WHERE tags && ARRAY['mysql', 'postgresql'];  -- Overlap
 
-- Array functions
SELECT array_length(tags, 1) FROM articles;
SELECT unnest(tags) FROM articles;  -- Expand to rows
 
-- Index for array queries
CREATE INDEX idx_tags ON articles USING GIN (tags);

How do you implement full-text search in PostgreSQL?

-- Create text search column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
 
UPDATE articles SET search_vector =
    to_tsvector('english', title || ' ' || COALESCE(content, ''));
 
CREATE INDEX idx_search ON articles USING GIN (search_vector);
 
-- Search with ranking
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;
 
-- Phrase search
SELECT * FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'database performance');
 
-- Auto-update with trigger
CREATE TRIGGER update_search_vector
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
    tsvector_update_trigger(search_vector, 'pg_catalog.english', title, content);

Replication and High Availability Questions

These questions test your knowledge of database redundancy and failover capabilities.

How does PostgreSQL streaming replication work?

flowchart TB
    subgraph REPLICATION["Primary-Replica Architecture"]
        direction LR
        PRIMARY["Primary<br/>Read/Write"]
        REPLICA["Replica<br/>Read Only"]
        PRIMARY -->|"WAL Stream"| REPLICA
    end
 
    APP_W["App<br/>(Write)"]
    APP_R["App<br/>(Read)"]
 
    APP_W --> PRIMARY
    APP_R --> REPLICA

-- Primary configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
synchronous_commit = on  -- or 'remote_apply' for sync replica
 
-- Replica setup
-- pg_basebackup -h primary -D /var/lib/postgresql/data -U replicator -P
 
-- standby.signal file indicates replica mode
-- primary_conninfo in postgresql.auto.conf
 
-- Check replication status (on primary)
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;
 
-- Check replication lag
SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_lag;

How do you set up MySQL replication?

-- Source (primary) configuration
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
 
-- Replica configuration
[mysqld]
server-id = 2
relay_log = relay-bin
read_only = ON
gtid_mode = ON
enforce_gtid_consistency = ON
 
-- Set up replication on replica
CHANGE REPLICATION SOURCE TO
    SOURCE_HOST = 'primary.example.com',
    SOURCE_USER = 'repl',
    SOURCE_PASSWORD = 'password',
    SOURCE_AUTO_POSITION = 1;
 
START REPLICA;
 
-- Check status
SHOW REPLICA STATUS\G

Why is connection pooling important?

flowchart LR
    subgraph WITHOUT["Without Pooling"]
        direction LR
        APP1["App Server"]
        DB1["Database<br/>100 conns<br/><i>Memory exhaustion!</i>"]
        APP1 -->|"100 connections"| DB1
    end
 
    subgraph WITH["With Pooling (PgBouncer)"]
        direction LR
        APP2["App Server"]
        POOL["PgBouncer<br/>pool"]
        DB2["Database<br/>20 conns"]
        APP2 -->|"100"| POOL
        POOL -->|"20"| DB2
    end

# PgBouncer configuration
[databases]
mydb = host=localhost dbname=mydb
 
[pgbouncer]
pool_mode = transaction  # or session, statement
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
 
# Pool modes:
# session: Connection held until client disconnects
# transaction: Connection returned after each transaction (most common)
# statement: Connection returned after each statement (no transactions)

Performance Tuning Questions

These questions test your knowledge of database configuration and performance optimization.

What are the key configuration parameters for PostgreSQL tuning?

# Memory
shared_buffers = 4GB          # 25% of RAM for dedicated server
effective_cache_size = 12GB   # 75% of RAM (estimate of OS cache)
work_mem = 256MB              # Per-operation memory (sorts, hashes)
maintenance_work_mem = 1GB    # For VACUUM, CREATE INDEX
 
# WAL
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
 
# Planner
random_page_cost = 1.1        # Lower for SSD (default 4.0)
effective_io_concurrency = 200 # For SSD
 
# Connections
max_connections = 200         # Use connection pooling for more

What are the key configuration parameters for MySQL/InnoDB tuning?

# Buffer pool (most important)
innodb_buffer_pool_size = 12G     # 70-80% of RAM
innodb_buffer_pool_instances = 8  # 1 per GB up to 8
 
# Logging
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1  # 1=durable, 2=faster
 
# I/O
innodb_io_capacity = 2000          # SSD
innodb_io_capacity_max = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8

When and how should you partition tables?

-- PostgreSQL declarative partitioning
CREATE TABLE orders (
    id BIGSERIAL,
    customer_id INT,
    order_date DATE,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
 
-- Create partitions
CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
 
-- Queries automatically route to correct partition
SELECT * FROM orders WHERE order_date = '2024-02-15';
-- Only scans orders_2024_q1
 
-- Partition pruning
EXPLAIN SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
-- Shows only orders_2024_q1 scanned

Partition strategies:

Range: Date, numeric ranges (most common)
List: Discrete values (country, status)
Hash: Even distribution (no natural range)

Why is VACUUM important in PostgreSQL?

-- VACUUM: Reclaims dead row space
VACUUM orders;           -- Standard vacuum
VACUUM FULL orders;      -- Rewrites table, locks it
VACUUM ANALYZE orders;   -- Vacuum + update statistics
 
-- Check vacuum status
SELECT relname, n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables;
 
-- ANALYZE: Updates query planner statistics
ANALYZE orders;
 
-- Auto-vacuum settings
-- autovacuum_vacuum_threshold = 50
-- autovacuum_vacuum_scale_factor = 0.2
-- Vacuum when: dead_tuples > threshold + scale_factor * table_size
 
-- For high-write tables, tune per-table
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,
    autovacuum_analyze_scale_factor = 0.02
);

Quick Reference

Topic	Key Points
PostgreSQL vs MySQL	Features, MVCC implementation, use cases
B-tree index	O(log n), sorted, composite column order matters
Index types	B-tree, Hash, GIN, GiST, BRIN - know when to use each
EXPLAIN ANALYZE	Read plans, identify seq scans, check row estimates
Isolation levels	RC, RR, Serializable - trade-offs
MVCC	xmin/xmax, snapshots, VACUUM
Locking	Optimistic vs pessimistic, FOR UPDATE
JSONB	Operators (@>, ?), GIN indexing
Replication	Streaming, sync vs async, lag monitoring
Tuning	shared_buffers, work_mem, connection pooling

SQL Joins Interview Guide - JOIN fundamentals
System Design Interview Guide - Database architecture patterns
Hibernate & JPA Interview Guide - ORM integration
Complete Java Backend Developer Interview Guide - Full Java backend preparation

30+ PostgreSQL & MySQL Interview Questions 2025: Indexing, MVCC & Query Optimization

Table of Contents

PostgreSQL vs MySQL Questions

What are the key architectural differences between PostgreSQL and MySQL?

When would you choose PostgreSQL over MySQL?

When would you choose MySQL over PostgreSQL?

Indexing Strategy Questions

How does a B-tree index work?

What is a composite index and how does column order matter?

What index types are available beyond B-tree?

What is a covering index and how does it enable index-only scans?

What is a partial index and when would you use it?

Query Optimization Questions

How do you read and interpret an execution plan?

What query patterns prevent index usage?

What is the N+1 query problem and how do you solve it?

Transaction and Locking Questions

What are ACID properties and why do they matter?

What are the four isolation levels and their trade-offs?

How does PostgreSQL implement MVCC?

What is the difference between optimistic and pessimistic locking?

Advanced Data Types Questions

When should you use JSON vs JSONB in PostgreSQL?

How do you query nested JSONB efficiently?

How do you work with array columns in PostgreSQL?

How do you implement full-text search in PostgreSQL?

Replication and High Availability Questions

How does PostgreSQL streaming replication work?

How do you set up MySQL replication?

Why is connection pooling important?

Performance Tuning Questions

What are the key configuration parameters for PostgreSQL tuning?

What are the key configuration parameters for MySQL/InnoDB tuning?

When and how should you partition tables?

Why is VACUUM important in PostgreSQL?

Quick Reference

Ready to ace your interview?

Table of Contents

PostgreSQL vs MySQL Questions

What are the key architectural differences between PostgreSQL and MySQL?

When would you choose PostgreSQL over MySQL?

When would you choose MySQL over PostgreSQL?

Indexing Strategy Questions

How does a B-tree index work?

What is a composite index and how does column order matter?

What index types are available beyond B-tree?

What is a covering index and how does it enable index-only scans?

What is a partial index and when would you use it?

Query Optimization Questions

How do you read and interpret an execution plan?

What query patterns prevent index usage?

What is the N+1 query problem and how do you solve it?

Transaction and Locking Questions

What are ACID properties and why do they matter?

What are the four isolation levels and their trade-offs?

How does PostgreSQL implement MVCC?

What is the difference between optimistic and pessimistic locking?

Advanced Data Types Questions

When should you use JSON vs JSONB in PostgreSQL?

How do you query nested JSONB efficiently?

How do you work with array columns in PostgreSQL?

How do you implement full-text search in PostgreSQL?

Replication and High Availability Questions

How does PostgreSQL streaming replication work?

How do you set up MySQL replication?

Why is connection pooling important?

Performance Tuning Questions

What are the key configuration parameters for PostgreSQL tuning?

What are the key configuration parameters for MySQL/InnoDB tuning?

When and how should you partition tables?

Why is VACUUM important in PostgreSQL?

Quick Reference

Related Articles

Ready to ace your interview?