Apache Kafka has become the backbone of modern event-driven architectures. Whether you're building real-time data pipelines, implementing microservices communication, or processing millions of events per second, Kafka knowledge is essential for backend and data engineering roles.
This guide covers the Kafka concepts that interviewers actually ask about—from core fundamentals to production operational knowledge.
1. Kafka Fundamentals
Understanding Kafka's architecture is the foundation for all advanced topics.
Core Components
What are the main components of Kafka architecture?
┌─────────────────────────────────────────────────────────┐
│ Kafka Cluster │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Broker 1│ │ Broker 2│ │ Broker 3│ │
│ │ │ │ │ │ │ │
│ │ Topic A │ │ Topic A │ │ Topic A │ │
│ │ Part 0 │ │ Part 1 │ │ Part 2 │ │
│ │ (Leader)│ │ (Leader)│ │ (Leader)│ │
│ │ │ │ │ │ │ │
│ │ Topic A │ │ Topic A │ │ Topic A │ │
│ │ Part 1 │ │ Part 0 │ │ Part 0 │ │
│ │(Replica)│ │(Replica)│ │(Replica)│ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────┘
▲ │
│ ▼
┌─────────┐ ┌─────────┐
│Producer │ │Consumer │
└─────────┘ │ Group │
└─────────┘
- Broker: A Kafka server that stores data and serves clients
- Topic: A category/feed name to which records are published
- Partition: An ordered, immutable sequence of records within a topic
- Replica: A copy of a partition for fault tolerance
- Leader: The replica that handles all reads and writes for a partition
- Consumer Group: A set of consumers that cooperate to consume a topic
What is a Kafka topic and how does it differ from a traditional message queue?
A topic is a logical channel for publishing and subscribing to streams of records. Unlike traditional queues:
| Aspect | Traditional Queue | Kafka Topic |
|---|---|---|
| Message retention | Deleted after consumption | Retained based on policy |
| Multiple consumers | Message goes to one consumer | All consumer groups get all messages |
| Replay capability | Not possible | Can replay from any offset |
| Ordering | Queue-wide (typically) | Per-partition only |
| Scaling | Limited | Horizontal via partitions |
Zookeeper vs KRaft
What is the role of Zookeeper in Kafka, and what is KRaft?
Traditionally, Kafka used Zookeeper for:
- Cluster membership and broker registration
- Topic configuration and partition metadata
- Controller election
- ACLs and quotas storage
KRaft (Kafka Raft) is the new metadata management system that eliminates Zookeeper:
# KRaft mode configuration (Kafka 3.3+)
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095Benefits of KRaft:
- Simplified operations (one system instead of two)
- Better scalability (millions of partitions)
- Faster controller failover
- Reduced operational complexity
Interview tip: Know that KRaft is production-ready as of Kafka 3.3 and Zookeeper is deprecated.
Partitions and Ordering
How do partitions affect message ordering?
Kafka only guarantees ordering within a single partition:
// Messages with same key go to same partition (ordered)
producer.send(new ProducerRecord<>("orders", "customer-123", order1));
producer.send(new ProducerRecord<>("orders", "customer-123", order2));
// order1 always processed before order2 for customer-123
// Messages with different keys may go to different partitions
producer.send(new ProducerRecord<>("orders", "customer-456", order3));
// order3 may be processed before order1 or order2Partition assignment formula (default):
partition = hash(key) % numberOfPartitions
How many partitions should a topic have?
Consider:
- Parallelism: More partitions = more parallel consumers
- Throughput: Each partition can handle ~10 MB/s writes
- Latency: More partitions = more end-to-end latency
- Memory: Each partition uses broker memory
- Recovery time: More partitions = longer recovery after broker failure
Rule of thumb: Start with max(expected_throughput / 10MB, number_of_consumers) and adjust based on monitoring.
2. Producer Deep Dive
Understanding producer configuration is critical for reliable message delivery.
Acknowledgment Modes
Explain the different acks settings and their trade-offs.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// acks=0: Fire and forget (fastest, least reliable)
props.put("acks", "0");
// acks=1: Leader acknowledgment (balanced)
props.put("acks", "1");
// acks=all: All in-sync replicas acknowledge (slowest, most reliable)
props.put("acks", "all");| Setting | Behavior | Durability | Latency |
|---|---|---|---|
acks=0 | Don't wait for acknowledgment | May lose messages | Lowest |
acks=1 | Wait for leader to write | May lose if leader fails before replication | Medium |
acks=all | Wait for all ISR to write | No loss if ISR > 1 | Highest |
Idempotent Producer
What is an idempotent producer and why is it important?
An idempotent producer ensures exactly-once delivery to a single partition, even with retries:
props.put("enable.idempotence", "true");
// Automatically sets:
// - acks=all
// - retries=Integer.MAX_VALUE
// - max.in.flight.requests.per.connection=5How it works:
- Producer gets a unique Producer ID (PID) on initialization
- Each message gets a sequence number
- Broker deduplicates based on PID + sequence number
- Retried messages with same sequence are ignored
Batching and Compression
How does Kafka batching work and how do you tune it?
// Batch size in bytes
props.put("batch.size", 16384); // 16 KB default
// Maximum wait time to fill batch
props.put("linger.ms", 5); // Wait up to 5ms for more messages
// Compression
props.put("compression.type", "lz4"); // Options: none, gzip, snappy, lz4, zstdBatching trade-offs:
- Larger batches: Better throughput, higher latency
- Smaller batches: Lower latency, more overhead
- Compression: Reduces network/storage but adds CPU overhead
Interview tip: linger.ms=0 sends immediately; increase for better batching.
Partitioning Strategies
How do you control which partition a message goes to?
// 1. Key-based (default): hash(key) % partitions
producer.send(new ProducerRecord<>("topic", "key", "value"));
// 2. Explicit partition
producer.send(new ProducerRecord<>("topic", 2, "key", "value"));
// 3. Custom partitioner
public class CustomPartitioner implements Partitioner {
@Override
public int partition(String topic, Object key, byte[] keyBytes,
Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
// Route VIP customers to dedicated partition
if (key.toString().startsWith("vip-")) {
return 0; // VIP partition
}
// Default: hash-based distribution for others
return Math.abs(key.hashCode()) % (partitions.size() - 1) + 1;
}
}3. Consumer Deep Dive
Consumer configuration affects reliability, throughput, and exactly-once semantics.
Consumer Groups
How do consumer groups enable scalable consumption?
Topic: orders (3 partitions)
┌─────────┬─────────┬─────────┐
│ Part 0 │ Part 1 │ Part 2 │
└────┬────┴────┬────┴────┬────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────┐
│ Consumer Group A │
│ ┌─────────┐ ┌─────────┐ │
│ │Consumer1│ │Consumer2│ │
│ │ Part 0 │ │Part 1,2 │ │
│ └─────────┘ └─────────┘ │
└─────────────────────────────┘
┌─────────────────────────────┐
│ Consumer Group B │
│ ┌─────────────────────────┐ │
│ │ Consumer 1 │ │
│ │ Part 0, 1, 2 │ │
│ └─────────────────────────┘ │
└─────────────────────────────┘
Key rules:
- Each partition assigned to exactly one consumer per group
- Multiple groups can consume same topic independently
- Max effective consumers = number of partitions
Offset Management
How do consumers track their position?
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "my-consumer-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// Auto commit (default, at-least-once)
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "5000");
// Manual commit for more control
props.put("enable.auto.commit", "false");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("orders"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
processRecord(record);
}
// Synchronous commit (blocks until confirmed)
consumer.commitSync();
// Or async commit (non-blocking)
consumer.commitAsync((offsets, exception) -> {
if (exception != null) {
log.error("Commit failed", exception);
}
});
}What happens when a new consumer joins a group?
A rebalance occurs:
- All consumers stop fetching
- Group coordinator reassigns partitions
- Consumers resume from last committed offsets
Rebalance strategies:
- Eager: Stop all, reassign all (default before 2.4)
- Cooperative (Incremental): Only reassign affected partitions
// Enable cooperative rebalancing
props.put("partition.assignment.strategy",
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor");Exactly-Once Semantics
How do you achieve exactly-once processing with Kafka?
Three levels of exactly-once:
-
Idempotent producer (producer → Kafka):
enable.idempotence=true -
Transactional producer (across partitions):
props.put("transactional.id", "my-transactional-id");
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>("topic1", "key", "value1"));
producer.send(new ProducerRecord<>("topic2", "key", "value2"));
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
}- Read-process-write (end-to-end):
// Consumer reads with isolation
props.put("isolation.level", "read_committed");
// Process and produce atomically
producer.beginTransaction();
for (ConsumerRecord<String, String> record : records) {
ProducerRecord<String, String> output = process(record);
producer.send(output);
}
// Commit consumer offsets as part of transaction
producer.sendOffsetsToTransaction(offsets, consumerGroupId);
producer.commitTransaction();4. Kafka Streams & ksqlDB
Stream processing allows real-time transformations on Kafka data.
Kafka Streams Basics
What is Kafka Streams and how does it differ from other stream processors?
Kafka Streams is a client library (not a cluster) for building stream processing applications:
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "order-processor");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
StreamsBuilder builder = new StreamsBuilder();
// Read from topic as stream
KStream<String, Order> orders = builder.stream("orders");
// Transform: filter, map, flatMap
KStream<String, Order> highValueOrders = orders
.filter((key, order) -> order.getAmount() > 1000)
.mapValues(order -> enrichOrder(order));
// Write to output topic
highValueOrders.to("high-value-orders");
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();Advantages:
- No separate cluster needed
- Exactly-once processing built-in
- Elastic scaling (just start more instances)
- Fault-tolerant with automatic recovery
KTable vs KStream
What's the difference between KStream and KTable?
// KStream: Append-only stream of events
// Each record is independent
KStream<String, PageView> pageViews = builder.stream("page-views");
// Records: (user1, view1), (user1, view2), (user2, view1)
// KTable: Changelog stream, latest value per key
// Represents current state
KTable<String, UserProfile> users = builder.table("users");
// State: {user1: profile1, user2: profile2}
// Join stream with table (enrichment)
KStream<String, EnrichedPageView> enriched = pageViews.join(
users,
(pageView, profile) -> new EnrichedPageView(pageView, profile)
);| Aspect | KStream | KTable |
|---|---|---|
| Semantics | Event stream | Changelog / state |
| Records | All events | Latest per key |
| Memory | Stateless | Stores state |
| Operations | map, filter, flatMap, join | aggregate, reduce, join |
Windowing
How do you perform windowed aggregations?
KStream<String, Transaction> transactions = builder.stream("transactions");
// Tumbling window: fixed, non-overlapping
KTable<Windowed<String>, Long> hourlyCounts = transactions
.groupByKey()
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofHours(1)))
.count();
// Hopping window: fixed size, overlapping
KTable<Windowed<String>, Long> slidingCounts = transactions
.groupByKey()
.windowedBy(TimeWindows.ofSizeAndGrace(
Duration.ofMinutes(5),
Duration.ofMinutes(1)
).advanceBy(Duration.ofMinutes(1)))
.count();
// Session window: activity-based, variable size
KTable<Windowed<String>, Long> sessionCounts = transactions
.groupByKey()
.windowedBy(SessionWindows.ofInactivityGapWithNoGrace(Duration.ofMinutes(30)))
.count();ksqlDB
When would you use ksqlDB instead of Kafka Streams?
-- Create a stream from a topic
CREATE STREAM orders (
order_id VARCHAR KEY,
customer_id VARCHAR,
amount DECIMAL(10,2),
order_time TIMESTAMP
) WITH (
KAFKA_TOPIC='orders',
VALUE_FORMAT='JSON'
);
-- Real-time aggregation
CREATE TABLE hourly_revenue AS
SELECT
customer_id,
WINDOWSTART as window_start,
SUM(amount) as total_revenue,
COUNT(*) as order_count
FROM orders
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY customer_id
EMIT CHANGES;
-- Join streams
CREATE STREAM enriched_orders AS
SELECT
o.order_id,
o.amount,
c.name as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
EMIT CHANGES;Use ksqlDB when:
- SQL is preferred over Java code
- Rapid prototyping needed
- Simpler transformations and aggregations
- Team has SQL expertise
Use Kafka Streams when:
- Complex business logic required
- Need full programmatic control
- Custom serialization/processing
- Embedding in existing application
5. Spring Kafka Integration
Spring Kafka simplifies Kafka integration in Spring Boot applications.
Basic Configuration
How do you configure Kafka in Spring Boot?
# application.yml
spring:
kafka:
bootstrap-servers: localhost:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
acks: all
retries: 3
consumer:
group-id: my-group
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
auto-offset-reset: earliest
properties:
spring.json.trusted.packages: com.example.dtoProducer with KafkaTemplate
@Service
@RequiredArgsConstructor
public class OrderProducer {
private final KafkaTemplate<String, Order> kafkaTemplate;
// Fire and forget
public void sendOrder(Order order) {
kafkaTemplate.send("orders", order.getId(), order);
}
// With callback
public void sendOrderWithCallback(Order order) {
CompletableFuture<SendResult<String, Order>> future =
kafkaTemplate.send("orders", order.getId(), order);
future.whenComplete((result, ex) -> {
if (ex == null) {
log.info("Sent order {} to partition {} offset {}",
order.getId(),
result.getRecordMetadata().partition(),
result.getRecordMetadata().offset());
} else {
log.error("Failed to send order {}", order.getId(), ex);
}
});
}
// Synchronous (blocking)
public void sendOrderSync(Order order) throws Exception {
kafkaTemplate.send("orders", order.getId(), order).get(10, TimeUnit.SECONDS);
}
}Consumer with @KafkaListener
@Service
@Slf4j
public class OrderConsumer {
// Basic listener
@KafkaListener(topics = "orders", groupId = "order-processor")
public void processOrder(Order order) {
log.info("Received order: {}", order.getId());
// Process order
}
// With metadata
@KafkaListener(topics = "orders", groupId = "order-processor")
public void processOrderWithMetadata(
@Payload Order order,
@Header(KafkaHeaders.RECEIVED_PARTITION) int partition,
@Header(KafkaHeaders.OFFSET) long offset,
@Header(KafkaHeaders.RECEIVED_TIMESTAMP) long timestamp) {
log.info("Received order {} from partition {} at offset {}",
order.getId(), partition, offset);
}
// Batch processing
@KafkaListener(topics = "orders", groupId = "batch-processor",
containerFactory = "batchFactory")
public void processOrderBatch(List<Order> orders) {
log.info("Received batch of {} orders", orders.size());
orders.forEach(this::processOrder);
}
// Manual acknowledgment
@KafkaListener(topics = "orders", groupId = "manual-ack")
public void processWithManualAck(Order order, Acknowledgment ack) {
try {
processOrder(order);
ack.acknowledge(); // Commit offset only after successful processing
} catch (Exception e) {
// Don't ack - message will be redelivered
throw e;
}
}
}Error Handling and Retry
@Configuration
public class KafkaConfig {
@Bean
public ConcurrentKafkaListenerContainerFactory<String, Order> kafkaListenerContainerFactory(
ConsumerFactory<String, Order> consumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, Order> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
// Retry configuration
factory.setCommonErrorHandler(new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(kafkaTemplate),
new FixedBackOff(1000L, 3) // 3 retries, 1 second apart
));
return factory;
}
}
// Or use retry topics (Spring Kafka 2.7+)
@RetryableTopic(
attempts = "3",
backoff = @Backoff(delay = 1000, multiplier = 2),
dltTopicSuffix = ".DLT",
autoCreateTopics = "true"
)
@KafkaListener(topics = "orders")
public void processWithRetryTopic(Order order) {
// Failures automatically sent to orders-retry-0, orders-retry-1, then orders.DLT
processOrder(order);
}
@DltHandler
public void handleDlt(Order order) {
log.error("Message exhausted retries: {}", order);
// Alert, store for manual review, etc.
}6. Reliability & Performance
Production Kafka deployments require careful tuning for reliability and performance.
Replication and ISR
How does Kafka replication work?
Partition 0 with replication.factor=3:
Broker 1 (Leader) Broker 2 (Follower) Broker 3 (Follower)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Message 1 │ │ Message 1 │ │ Message 1 │
│ Message 2 │ │ Message 2 │ │ Message 2 │
│ Message 3 │◄─│ Message 3 │ │ (catching up) │
│ (committed) │ │ (in ISR) │ │ (not in ISR) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
In-Sync Replicas (ISR): Replicas that are:
- Connected to Zookeeper/controller
- Fetching from leader within
replica.lag.time.max.ms
What is min.insync.replicas and why is it important?
# Topic configuration
min.insync.replicas=2With acks=all and min.insync.replicas=2:
- Write succeeds only if at least 2 replicas acknowledge
- If ISR drops below 2, writes are rejected (NOT_ENOUGH_REPLICAS)
- Prevents data loss even if one broker fails
Common configuration for durability:
replication.factor=3
min.insync.replicas=2
acks=allThroughput Tuning
How do you optimize Kafka for high throughput?
Producer tuning:
# Larger batches
batch.size=65536
linger.ms=10
# Compression
compression.type=lz4
# More in-flight requests (with idempotence)
max.in.flight.requests.per.connection=5
# Larger buffer
buffer.memory=67108864Consumer tuning:
# Fetch more data per request
fetch.min.bytes=1048576
fetch.max.wait.ms=500
# Larger fetch size
max.partition.fetch.bytes=1048576
# Process in batches
max.poll.records=500Broker tuning:
# More I/O threads
num.io.threads=16
num.network.threads=8
# Socket buffers
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576Latency Tuning
How do you optimize Kafka for low latency?
# Producer: Send immediately
linger.ms=0
batch.size=16384
# Consumer: Fetch frequently
fetch.max.wait.ms=0
fetch.min.bytes=1
# Broker: Faster flushing
log.flush.interval.messages=1 # Careful: impacts durabilityTrade-off: Lower latency typically means lower throughput.
7. Operations & Monitoring
Operational excellence is key to running Kafka in production.
Essential Metrics
What metrics should you monitor for Kafka?
Broker metrics:
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec- Message ratekafka.server:type=ReplicaManager,name=UnderReplicatedPartitions- Replication healthkafka.controller:type=KafkaController,name=OfflinePartitionsCount- Availabilitykafka.network:type=RequestMetrics,name=RequestsPerSec- Request rate
Producer metrics:
record-send-rate- Messages sent per secondrecord-error-rate- Failed sendsrequest-latency-avg- Average request latencybatch-size-avg- Average batch size
Consumer metrics:
records-consumed-rate- Consumption raterecords-lag-max- Maximum lag across partitionscommit-latency-avg- Offset commit latencyrebalance-rate-and-time- Rebalance frequency
Consumer Lag Monitoring
How do you detect and troubleshoot consumer lag?
# Check consumer group lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--describe --group my-consumer-group
# Output:
# TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
# orders 0 1000 1500 500
# orders 1 2000 2100 100Causes of consumer lag:
- Slow processing: Optimize consumer logic or increase parallelism
- Insufficient consumers: Add more consumers (up to partition count)
- Network issues: Check connectivity and bandwidth
- Rebalancing storms: Use cooperative rebalancing, increase
session.timeout.ms - GC pauses: Tune JVM garbage collection
Schema Registry
What is Schema Registry and why is it important?
Schema Registry provides:
- Centralized schema storage
- Schema evolution with compatibility checks
- Automatic serialization/deserialization
// Producer with Avro and Schema Registry
props.put("schema.registry.url", "http://localhost:8081");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
// Consumer
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("specific.avro.reader", "true");Compatibility modes:
- BACKWARD: New schema can read old data
- FORWARD: Old schema can read new data
- FULL: Both backward and forward compatible
- NONE: No compatibility checks
Common Production Issues
What are common Kafka production issues and solutions?
| Issue | Symptoms | Solution |
|---|---|---|
| Under-replicated partitions | Alerts, slow writes | Check broker health, network, disk I/O |
| Consumer lag | Growing lag metrics | Scale consumers, optimize processing |
| Rebalancing storms | Frequent rebalances, high latency | Increase timeouts, use cooperative assignor |
| Disk full | Write failures | Add retention policies, expand storage |
| Uneven partition distribution | Some brokers overloaded | Run partition reassignment |
| Producer timeouts | Send failures | Check broker health, tune timeouts |
Quick Reference: Common Interview Questions
| Topic | Key Points |
|---|---|
| Architecture | Brokers, topics, partitions, replicas, consumer groups |
| Ordering | Guaranteed within partition only |
| Delivery guarantees | acks=0/1/all, idempotent producer, transactions |
| Consumer groups | Load balancing, rebalancing, offset management |
| Exactly-once | Idempotent + transactional producer + read_committed |
| Kafka Streams | Library, KStream vs KTable, windowing |
| Replication | ISR, min.insync.replicas, leader election |
| Monitoring | Lag, under-replicated partitions, throughput |
Related Articles
- Microservices Architecture Interview Guide - Kafka in microservices context
- System Design Interview Guide - Distributed systems fundamentals
- Spring Boot Interview Guide - Spring Kafka integration
- Complete Java Backend Developer Interview Guide - Full Java backend preparation
