Skip to content

nikhil04pimpare/2025-project-kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Kafka Quickstart: Docker, Python Client, and Core Concepts

This repository contains a complete setup for a minimal, single-broker Apache Kafka cluster using Docker Compose and demonstrates the fundamental Producer, Consumer, and Admin operations using the official kafka-python client library.

🎯 Goal

To run a single-broker Kafka cluster locally and practice sending and receiving structured JSON messages using the appropriate Python client configurations (serialization and deserialization).


πŸ› οΈ Prerequisites

  • Docker & Docker Compose (or Docker CLI with compose plugin)
  • Python 3.8+
  • Dependencies: Ensure you have installed the required client library in your virtual environment:
    pip install kafka-python

βš™οΈ 1. Cluster Setup and Configuration (docker-compose.yml)

The docker-compose.yml file defines our two services: zookeeper and kafka. It uses the Zookeeper-compatible image version 6.0.0 for stability.

πŸ’‘ Critical Configuration Concepts Solved in docker-compose.yml:

  1. External Connectivity (KAFKA_ADVERTISED_LISTENERS): This is essential. It tells the Kafka broker inside the Docker network that its external, reachable address is localhost:9092, allowing your host machine to connect successfully.
  2. Single-Broker Replication Fix (KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR): This overrides the default internal replication factor (R=3) to 1, which is mandatory when running only one Kafka broker locally. This prevents repetitive errors related to the internal __consumer_offsets topic.

πŸ’» 2. Client Operations and File Overview

The three Python files demonstrate the complete lifecycle of a message:

File Name Client Class Used Purpose Key Concept Demonstrated
admin_client.py KafkaAdminClient Creates the topic quickstart_topic (2 Partitions, 1 Replica). Partitioning & Replication Factor
producer.py KafkaProducer Sends 5 structured messages with user IDs as keys. Serialization (Python dict β†’ JSON Bytes)
consumer.py KafkaConsumer Reads all incoming messages from the topic. Deserialization (JSON Bytes β†’ Python dict)

πŸƒ 3. Execution Steps Summary

Follow these steps in sequence to bring up the environment and test the message flow.

  1. Start the Cluster:
    docker-compose up -d
  2. Verify Topic Creation: Run the admin script to ensure the destination topic exists.
    python admin_client.py
  3. Start the Consumer (New Terminal): Open a separate terminal window (and activate your venv) to run the consumer script. It will connect and wait.
    python consumer.py
  4. Run the Producer (Original Terminal): Execute the producer script to send 5 JSON messages.
    python producer.py

    Verification: The consumer terminal should immediately start printing the received, deserialized Python dictionary messages.


πŸ’‘ 4. Core Kafka Concepts Review

Concept Description Significance
Offset A unique, sequential ID assigned to each message within a single partition. Acts as a cursor for consumers, telling them exactly where to resume reading after a stop/restart.
Partition The primary unit of parallelism. Messages within the same partition are strictly ordered. Determines message throughput and how many consumers can process data concurrently.
Replication Factor The number of copies of a partition's data stored across the cluster. Ensures fault tolerance; set to 1 in our local setup.
Serialization The process of converting structured data (like a Python dictionary) into bytes before sending to Kafka. Necessary because Kafka and the network only transmit raw byte arrays.

About

Kafka QuickStart: Docker, Python Client, and Core Concepts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages