Useful Data Tips

Apache Kafka

⏱️ 8 sec read 🗄️ Data Management

What it is: Distributed event streaming platform. Publish-subscribe messaging at massive scale. Backbone of modern data architectures.

What It Does Best

High throughput. Millions of events per second per cluster. Linear scaling with partitions.

Durability and replay. Persistent log storage. Reprocess historical events. Time travel through data.

Ecosystem. Kafka Connect (integrate everything), Kafka Streams (stream processing), Schema Registry. Complete platform.

Pricing

Free: Open source, Apache 2.0. Managed Kafka: Confluent Cloud ($0.11/GB ingress), AWS MSK, Azure Event Hubs.

When to Use It

✅ Real-time data pipelines

✅ Event-driven microservices

✅ Activity tracking and monitoring

✅ Log aggregation and stream processing

When NOT to Use It

❌ Request-response patterns (use REST/gRPC)

❌ Small-scale messaging (RabbitMQ/Redis simpler)

❌ Need complex routing (use RabbitMQ)

Bottom line: The event streaming platform. Ubiquitous in modern data infrastructure. Not the easiest to operate, but nothing else matches the combination of throughput, durability, and ecosystem. Essential knowledge for data engineers.

Visit Apache Kafka →

← Back to Data Management Tools