This repository contains a complete setup for a minimal, single-broker Apache Kafka cluster using Docker Compose and demonstrates the fundamental Producer, Consumer, and Admin operations using the official kafka-python client library.
To run a single-broker Kafka cluster locally and practice sending and receiving structured JSON messages using the appropriate Python client configurations (serialization and deserialization).
- Docker & Docker Compose (or Docker CLI with
composeplugin) - Python 3.8+
- Dependencies: Ensure you have installed the required client library in your virtual environment:
pip install kafka-python
The docker-compose.yml file defines our two services: zookeeper and kafka. It uses the Zookeeper-compatible image version 6.0.0 for stability.
π‘ Critical Configuration Concepts Solved in
docker-compose.yml:
- External Connectivity (
KAFKA_ADVERTISED_LISTENERS): This is essential. It tells the Kafka broker inside the Docker network that its external, reachable address islocalhost:9092, allowing your host machine to connect successfully.- Single-Broker Replication Fix (
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR): This overrides the default internal replication factor (R=3) to1, which is mandatory when running only one Kafka broker locally. This prevents repetitive errors related to the internal__consumer_offsetstopic.
The three Python files demonstrate the complete lifecycle of a message:
| File Name | Client Class Used | Purpose | Key Concept Demonstrated |
|---|---|---|---|
admin_client.py |
KafkaAdminClient |
Creates the topic quickstart_topic (2 Partitions, 1 Replica). |
Partitioning & Replication Factor |
producer.py |
KafkaProducer |
Sends 5 structured messages with user IDs as keys. | Serialization (Python dict β JSON Bytes) |
consumer.py |
KafkaConsumer |
Reads all incoming messages from the topic. | Deserialization (JSON Bytes β Python dict) |
Follow these steps in sequence to bring up the environment and test the message flow.
- Start the Cluster:
docker-compose up -d
- Verify Topic Creation: Run the admin script to ensure the destination topic exists.
python admin_client.py
- Start the Consumer (New Terminal): Open a separate terminal window (and activate your venv) to run the consumer script. It will connect and wait.
python consumer.py
- Run the Producer (Original Terminal): Execute the producer script to send 5 JSON messages.
python producer.py
Verification: The consumer terminal should immediately start printing the received, deserialized Python dictionary messages.
| Concept | Description | Significance |
|---|---|---|
| Offset | A unique, sequential ID assigned to each message within a single partition. | Acts as a cursor for consumers, telling them exactly where to resume reading after a stop/restart. |
| Partition | The primary unit of parallelism. Messages within the same partition are strictly ordered. | Determines message throughput and how many consumers can process data concurrently. |
| Replication Factor | The number of copies of a partition's data stored across the cluster. | Ensures fault tolerance; set to 1 in our local setup. |
| Serialization | The process of converting structured data (like a Python dictionary) into bytes before sending to Kafka. | Necessary because Kafka and the network only transmit raw byte arrays. |