Apache Kafka
What it is: Distributed event streaming platform. Publish-subscribe messaging at massive scale. Backbone of modern data architectures.
What It Does Best
High throughput. Millions of events per second per cluster. Linear scaling with partitions.
Durability and replay. Persistent log storage. Reprocess historical events. Time travel through data.
Ecosystem. Kafka Connect (integrate everything), Kafka Streams (stream processing), Schema Registry. Complete platform.
Pricing
Free: Open source, Apache 2.0. Managed Kafka: Confluent Cloud ($0.11/GB ingress), AWS MSK, Azure Event Hubs.
When to Use It
✅ Real-time data pipelines
✅ Event-driven microservices
✅ Activity tracking and monitoring
✅ Log aggregation and stream processing
When NOT to Use It
❌ Request-response patterns (use REST/gRPC)
❌ Small-scale messaging (RabbitMQ/Redis simpler)
❌ Need complex routing (use RabbitMQ)
Bottom line: The event streaming platform. Ubiquitous in modern data infrastructure. Not the easiest to operate, but nothing else matches the combination of throughput, durability, and ecosystem. Essential knowledge for data engineers.