Skip to content

Latest commit

 

History

History
442 lines (309 loc) · 17.5 KB

File metadata and controls

442 lines (309 loc) · 17.5 KB

Example Output: Python URL Shortener — Phase 1, Session 1

This file shows what the agent produces after completing the intake conversation. It covers the learner profile summary, confirmed stack, curriculum outline, and one complete session block. The learner is a mid-level backend developer building a URL shortener as a portfolio project.


Learner Profile (from intake)

Field Value
Project URL Shortener — a service that takes a long URL and returns a short alias, tracks click counts, and expires links after N days
Domain Web API
Target outcome Portfolio piece; realistic enough to deploy
Known languages Java (comfortable), Python (beginner)
Learning intent Learn Python by building something real — not just syntax exercises
Weekly hours ~10 hrs/week, prefers 2–3 hr sessions on evenings
Constraints No paid cloud services; must run locally with Docker

Confirmed language: Python Rationale: The project is a greenfield web API — FastAPI's async model and minimal boilerplate makes it a stronger teaching vehicle for Python idioms than a Java/Spring equivalent would be for someone learning Python as a second language. The learner's Java background means they already understand OOP; the goal is to learn Python's conventions, not restart from type systems.


Confirmed Stack

Layer Tool
Web framework FastAPI
Database SQLAlchemy (async) + asyncpg + PostgreSQL
Testing pytest + httpx (async test client)
Containerization Docker + docker-compose
Task queue Celery + Redis (Phase 4)
Linting / format ruff, black
Package manager pip / venv

Curriculum Outline (approved)

PROJECT:   url-shortener — a FastAPI service that creates short links, tracks clicks, and expires them
LANGUAGE:  Python
STACK:     FastAPI, SQLAlchemy async, asyncpg, PostgreSQL, pytest, Docker
PHASES:    4 phases
SESSIONS:  3–4 sessions per phase
TOTAL:     ~40 days × 2.5 hrs avg = ~100 learning hours
Phase 1: Python Foundations & Project Bootstrap — 10 days, ~25 hours
  Session 1: Days 1–3  — Python environment, types, functions, project layout
  Session 2: Days 4–6  — Classes, modules, exceptions, the url_shortener package
  Session 3: Days 7–10 — FastAPI basics: routes, models, dependency injection

Phase 2: Persistence & Data Layer — 10 days, ~25 hours
  Session 4: Days 11–13 — SQLAlchemy async setup, models, migrations (Alembic)
  Session 5: Days 14–16 — CRUD operations, async sessions, repository pattern
  Session 6: Days 17–20 — Testing with pytest: fixtures, async tests, DB isolation

Phase 3: Business Logic & API Hardening — 12 days, ~30 hours
  Session 7: Days 21–24 — Short code generation, collision handling, expiry logic
  Session 8: Days 25–28 — Click tracking, rate limiting, error handling middleware
  Session 9: Days 29–32 — Full test suite, coverage, integration tests

Phase 4: Production Readiness — 8 days, ~20 hours
  Session 10: Days 33–36 — Docker + docker-compose, environment config, secrets
  Session 11: Days 37–40 — Background tasks with Celery (expiry cleanup), observability basics

Phase 1: Python Foundations & Project Bootstrap

Phase 1 of 4 | Language: Python | Duration: 10 days (~25 hours) | Prerequisites: none

By the end of this phase, you will have a working Python project layout for the URL shortener — a proper package structure, a virtual environment, typed functions, and the project running its first FastAPI route. This phase matters because every subsequent phase builds on this foundation; a poorly structured Python project is hard to extend, test, or containerize later.

Each session follows: Reading → Build → Review (checklist + quiz) → Exercises → Further Reading.


Days 1–3: Python Environment, Types, and Functions

By the end of this session you can set up a Python virtual environment, write typed functions, and run a script that encodes a URL into a short code. (3 hours)


Reading: Virtual Environments and Project Isolation

Python does not isolate dependencies per-project by default — packages installed with pip go into a global site-packages directory. This causes version conflicts across projects. A virtual environment (venv) creates a local copy of the Python interpreter and its package directory, scoped to one project.

python3 -m venv .venv
source .venv/bin/activate   # macOS / Linux
# .venv\Scripts\activate    # Windows
pip install fastapi

After activation, python and pip resolve to the local .venv/ copies. Deactivate with deactivate.

💡 Tip

Always create .venv at the project root and add it to .gitignore. Never commit a virtual environment — it contains paths specific to your machine.


Reading: Type Hints

Python is dynamically typed, but since Python 3.5 you can annotate variables and function signatures with type hints. These annotations are not enforced at runtime — they are metadata for static analysis tools (mypy, ruff) and for readers.

def shorten(url: str, length: int = 8) -> str:
    """Return a short alphanumeric code derived from url."""
    ...

The return type annotation -> str documents intent. If you return an int by mistake, mypy will flag it — but Python will not raise an error at runtime.

# Type hints work with built-in generics (Python 3.9+)
def batch_shorten(urls: list[str]) -> dict[str, str]:
    return {url: shorten(url) for url in urls}

⚠️ Warning

Type hints do not make Python statically typed. A function annotated -> str can still return None at runtime if you forget a return path. Use a type checker (mypy --strict) to catch this.


Reading: Functions and the hashlib Module

Python functions are first-class objects — they can be passed as arguments, stored in variables, and returned from other functions. For now, focus on the basics: def, default arguments, and return values.

The standard library's hashlib module provides cryptographic hash functions. For URL shortening, you can hash the URL and take a slice of the hex digest as the short code.

import hashlib

def shorten(url: str, length: int = 8) -> str:
    digest = hashlib.sha256(url.encode()).hexdigest()
    return digest[:length]

url.encode() converts the str to bytes because hashlib functions require bytes, not strings. hexdigest() returns the hash as a lowercase hex string.

💡 Tip

sha256 is not the only option, but it is collision-resistant and fast enough for this use case. hashlib.md5 is faster but cryptographically broken — fine for non-security uses, but sha256 is a better habit.


Build: Project Layout and First Short-Code Function

Create the initial project layout and a working shorten() function with a test you can run from the command line.

mkdir -p url_shortener
touch url_shortener/__init__.py
touch url_shortener/codec.py
touch main.py
touch requirements.txt

url_shortener/codec.py:

import hashlib


def shorten(url: str, length: int = 8) -> str:
    """Return a URL-safe short code derived from url.

    Args:
        url: The full URL to shorten.
        length: Number of characters in the output code (default 8).

    Returns:
        A lowercase hex string of the requested length.
    """
    if not url:
        raise ValueError("url must not be empty")
    digest = hashlib.sha256(url.encode()).hexdigest()
    return digest[:length]

main.py:

from url_shortener.codec import shorten

if __name__ == "__main__":
    test_url = "https://example.com/very/long/path?query=true"
    code = shorten(test_url)
    print(f"Short code: {code}")
    print(f"Length: {len(code)}")

Run it:

python main.py
# Short code: 3f4b8c1e
# Length: 8

requirements.txt:

fastapi==0.111.0
uvicorn[standard]==0.29.0
git commit -m "bootstrap: add project layout and shorten() codec function"

Review: Days 1–3

Daily Checklist

  • I can create and activate a Python virtual environment
  • I understand why virtual environments exist and what .venv/ contains
  • I can write a typed function with default arguments and a docstring
  • I understand that type hints are not enforced at runtime
  • url_shortener/ is a package (has __init__.py) and codec.py is importable from it
  • shorten("https://example.com") returns a consistent 8-character hex string

Quiz

Q1: What does url.encode() return when url is the string "https://example.com"?

  • A) A list of ASCII codes
  • B) A bytes object containing the UTF-8 encoded string
  • C) A str object with escape sequences
  • D) An integer representation of the URL
Answer

Bstr.encode() returns a bytes object using UTF-8 encoding by default. hashlib functions require bytes, not str — passing a str directly raises TypeError. Options A and D are wrong types entirely; C describes something like repr(), not .encode().


Q2: A colleague adds -> None to your shorten() function signature but does not change the body. What happens when you call it?

  • A) Python raises TypeError because the return type does not match
  • B) The function still returns a str — type hints are not enforced at runtime
  • C) The function returns None because the annotation overrides the body
  • D) mypy will fix the annotation automatically
Answer

B — Type hints are metadata only. Python does not enforce them at runtime. The function body still executes normally and returns whatever the return statement produces. A static type checker like mypy would flag the mismatch — but only if you run it. Option D is false; mypy reports errors, it does not rewrite code.


Q3: You run pip install fastapi without activating a virtual environment. Where does the package get installed?

  • A) Nowhere — pip refuses to install without a venv
  • B) Into the current directory
  • C) Into the system-wide Python's site-packages directory
  • D) Into .venv/lib/ of the nearest parent directory containing a .venv/
Answer

C — Without an active virtual environment, pip installs into the system Python's site-packages. This can cause version conflicts with other projects or system tools. Option A is incorrect — pip does not require a venv. Options B and D describe behaviors that do not exist in standard Python.


Q4: Which of the following is true about hashlib.sha256(url.encode()).hexdigest()[:8]?

  • A) Two different URLs will never produce the same 8-character prefix
  • B) The same URL always produces the same 8-character prefix
  • C) The output changes every time because sha256 is random
  • D) The output is Base64-encoded
Answer

B — SHA-256 is a deterministic function: the same input always produces the same output. The 8-character slice is therefore stable and repeatable. Option A is false — truncating to 8 hex characters means only 4 billion possible codes, so collisions are possible with enough URLs. Option C is false — there is no randomness. Option D is false — hexdigest() returns lowercase hexadecimal, not Base64.


Exercises

Easy: Validate URL format before hashing

Extend shorten() to raise ValueError if url does not start with http:// or https://. Write a second function is_valid_url(url: str) -> bool that shorten() uses internally.

Solution
def is_valid_url(url: str) -> bool:
    return url.startswith(("http://", "https://"))


def shorten(url: str, length: int = 8) -> str:
    if not url:
        raise ValueError("url must not be empty")
    if not is_valid_url(url):
        raise ValueError(f"url must start with http:// or https://, got: {url!r}")
    digest = hashlib.sha256(url.encode()).hexdigest()
    return digest[:length]

str.startswith() accepts a tuple of prefixes — this is idiomatic Python. The !r format specifier in the f-string calls repr() on the value, which wraps strings in quotes and makes the error message unambiguous when the input contains whitespace or is empty.


Medium: Parameterize the character set

SHA-256's hexdigest() only uses characters 0-9a-f. Extend the codec to support a Base62 encoding (0-9A-Za-z) so short codes look like aB3xY9kZ instead of 3f4b8c1e. Add a charset parameter to shorten() that accepts "hex" or "base62".

Solution
import hashlib
import string

BASE62 = string.digits + string.ascii_uppercase + string.ascii_lowercase
_CHARSETS = {"hex", "base62"}


def _to_base62(number: int, length: int) -> str:
    result = []
    while number and len(result) < length:
        result.append(BASE62[number % 62])
        number //= 62
    while len(result) < length:
        result.append(BASE62[0])
    return "".join(reversed(result))


def shorten(url: str, length: int = 8, charset: str = "hex") -> str:
    if not url:
        raise ValueError("url must not be empty")
    if charset not in _CHARSETS:
        raise ValueError(f"charset must be one of {_CHARSETS}, got: {charset!r}")
    digest = hashlib.sha256(url.encode())
    if charset == "hex":
        return digest.hexdigest()[:length]
    return _to_base62(int(digest.hexdigest(), 16), length)

int(digest.hexdigest(), 16) converts the hex string to a Python integer (arbitrary precision), which _to_base62 then converts to Base62. Python integers have no size limit, so this works for any SHA-256 output. The string module provides the character constants — avoid hardcoding "0123456789ABC..." by hand.


Challenge: Collision-resistant short codes

The current shorten() is deterministic — the same URL always produces the same code. But two different URLs could hash to the same 8-character prefix. Design a CodecRegistry class that:

  1. Stores a mapping of short_code → url in memory (a dict)
  2. Generates a short code for a URL using shorten()
  3. Detects collisions: if the generated code is already mapped to a different URL, appends a counter suffix and retries until the code is unique
  4. Returns the existing code if the URL was already registered
  5. Exposes encode(url) -> str and decode(short_code) -> str | None methods
Solution
import hashlib


class CodecRegistry:
    def __init__(self, length: int = 8) -> None:
        self._length = length
        self._code_to_url: dict[str, str] = {}
        self._url_to_code: dict[str, str] = {}

    def encode(self, url: str) -> str:
        if not url:
            raise ValueError("url must not be empty")
        # Return existing code if URL is already registered
        if url in self._url_to_code:
            return self._url_to_code[url]

        attempt = 0
        while True:
            seed = f"{url}:{attempt}"
            code = hashlib.sha256(seed.encode()).hexdigest()[: self._length]
            if code not in self._code_to_url:
                # Slot is free — claim it
                self._code_to_url[code] = url
                self._url_to_code[url] = code
                return code
            if self._code_to_url[code] == url:
                # Same URL somehow reached by a different path — safe
                return code
            # Collision with a different URL — retry with incremented seed
            attempt += 1

    def decode(self, short_code: str) -> str | None:
        return self._code_to_url.get(short_code)

The attempt counter changes the hash input (f"{url}:{attempt}") without modifying the URL itself. This makes collision resolution deterministic and reproducible — given the same registry state, the same URL always produces the same code. The _url_to_code reverse index makes encode() O(1) for repeated calls with the same URL. str | None is the Python 3.10+ union syntax — use Optional[str] from typing for older versions.


Further Reading

  • Python hashlib — Standard Library Docs — Covers every hash algorithm available in hashlib, the BLAKE2 variants, and when to prefer one over another; read it when choosing a hash function for a new feature.
  • PEP 484 — Type Hints — The original proposal that introduced type hints; covers the rationale, what is and is not enforced, and how typing generics work — read it to understand why type hints were designed as optional metadata rather than enforced constraints.

Phase 1 Complete: What You Have Built

Daily Checklist

  • Virtual environment created, activated, and added to .gitignore
  • url_shortener/ package with __init__.py in place
  • codec.py with a typed, documented shorten() function
  • shorten() validates input and raises ValueError for bad input
  • main.py runs without error and prints a consistent short code
  • requirements.txt contains fastapi and uvicorn
  • Git history has at least one commit with a meaningful message

📚 Concepts to Deepen on Your Own

• Python's data model (__repr__, __str__, __eq__) — understanding these makes your classes behave like built-in types • pathlib.Path — idiomatic file path handling in Python, more readable than os.pathargparse — adding CLI flags to main.py so you can call python main.py --url https://example.com