Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
37d3c25
Add Docker Compose configuration for local dev MariaDB setup
Agamya-Samuel Apr 14, 2025
14a69eb
Refactor quote fetching functionality to retrieve the quote of the da…
Agamya-Samuel Apr 14, 2025
d1a3f84
Add created_at and updated_at fields to QuoteSchema for better tracki…
Agamya-Samuel Apr 14, 2025
ccea392
Add created_at and updated_at fields to Quote model for improved trac…
Agamya-Samuel Apr 14, 2025
f8cf05e
Implement logging in database initialization for better error trackin…
Agamya-Samuel Apr 14, 2025
eef7e0d
Add index to featured_date in Quote model for improved query performance
Agamya-Samuel Apr 14, 2025
b3ceaf4
Refactor Quote schema to introduce QuoteBase for shared fields and en…
Agamya-Samuel Apr 14, 2025
4eda777
Rename get_quote_of_the_day to get_quote_of_the_day_route for clarity…
Agamya-Samuel Apr 14, 2025
ac50e1b
Update response models in quote routes to use Quote instead of QuoteS…
Agamya-Samuel Apr 14, 2025
1ae3dc2
Add created_at and updated_at fields to the quote extraction function…
Agamya-Samuel Apr 14, 2025
e653416
Add logging configuration to CRUD operations for improved debugging
Agamya-Samuel Apr 14, 2025
f73a243
Add create_multiple_quotes function to handle batch quote creation wi…
Agamya-Samuel Apr 14, 2025
83cec18
Add environment variables for Historical Featured Quotes App configur…
Agamya-Samuel Apr 15, 2025
d8c3260
Change quote and author fields to Text type for improved flexibility …
Agamya-Samuel Apr 15, 2025
c087401
Refactor database configuration to remove default values for improved…
Agamya-Samuel Apr 15, 2025
39d398d
Add initial implementation of the Historical Featured Quotes App
Agamya-Samuel Apr 15, 2025
f9b30eb
Enhance create_multiple_quotes function to support dynamic ID generat…
Agamya-Samuel Apr 15, 2025
24dc6eb
Update unique ID generation in extract_quote function to include feat…
Agamya-Samuel Apr 15, 2025
4d8e4d6
Update README.md to include Docker support and clarify database setup…
Agamya-Samuel Apr 15, 2025
3ad9b43
Add README.md for Historical Quote Backfill Tool with usage instructi…
Agamya-Samuel Apr 15, 2025
a6c5d97
Enhance README.md and backfill tool documentation with usage instruct…
Agamya-Samuel Apr 15, 2025
1870c93
Merge pull request #2 from indictechcom/main
Agamya-Samuel Apr 15, 2025
d02b16e
Merge branch 'feature/backfill-historical-featured-quotes' of https:/…
Agamya-Samuel Apr 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,9 @@ DB_HOST=
DB_USER=
DB_PASSWORD=
DB_NAME=
DB_PORT=
DB_PORT=

# Backfill Historical Featured Quotes App
WIKIQUOTE_BASE_URL='https://en.wikiquote.org/wiki/'
MAX_RETRIES=3
TIMEOUT_SECONDS=10
83 changes: 69 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,21 @@ A FastAPI-based REST API that fetches and stores daily quotes from Wikiquote. Th
- Filter quotes by author
- Pagination support
- MariaDB/MySQL database storage
- Docker support for development database (docker-compose-dev-db.yml)
- Clean and modular code structure

## Prerequisites

- Python 3.10+
- pip (Python package manager)
- MariaDB/MySQL database
- MariaDB/MySQL database (or Docker for running the development database locally)

## Installation

1. Clone the repository:

```bash
git clone https://github.com/Agamya-Samuel/wq-qotd.git
git clone https://github.com/indictechcom/wq-qotd.git
cd wq-qotd
```

Expand Down Expand Up @@ -54,23 +55,32 @@ A FastAPI-based REST API that fetches and stores daily quotes from Wikiquote. Th
1. Create a `.env` file in the project root using `.env.example` as a template:

```env
# Database Configuration
DB_HOST=localhost
DB_PORT=3306
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=your_database
DB_PORT=3306

# Backfill Historical Featured Quotes App
WIKIQUOTE_BASE_URL='https://en.wikiquote.org/wiki/'
MAX_RETRIES=3
TIMEOUT_SECONDS=10
```

2. Create the database:
## Docker Development Database

```sql
CREATE DATABASE your_database;
```
For local development, you can use the provided Docker Compose setup to run a MariaDB instance:

3. Initialize the database:
```bash
python -m app.database.init_db
```
```bash
docker-compose -f docker-compose-dev-db.yml up -d
```

This will start a MariaDB instance with the following configuration:

- Database: s56492\_\_wq-qotd-db
- Port: 32768:3306
- Root Password: root

## Running the Application

Expand All @@ -86,6 +96,7 @@ A FastAPI-based REST API that fetches and stores daily quotes from Wikiquote. Th

## API Endpoints

- `GET /`: List all available routes
- `GET /api/quote_of_the_day`: Get today's quote
- `GET /api/quotes/{date}`: Get quote by date (YYYY-MM-DD)
- `GET /api/quotes`: Get all quotes (with pagination and author filter)
Expand All @@ -112,10 +123,21 @@ wq-qotd/
│ └── schemas/
│ ├── __init__.py
│ └── schemas.py
├── backfill_historical_featured_quotes_app/
│ ├── __init__.py
│ ├── __main__.py
│ ├── main.py
│ ├── quotes-extraction-config.json
│ ├── README.md
│ └── core/
│ ├── config.py
│ └── utils.py
├── main.py
├── docker-compose-dev-db.yml
├── requirements.txt
├── .env
├── .env.example
├── .env.production
└── README.md
```

Expand All @@ -132,15 +154,46 @@ wq-qotd/
- `models.py`: SQLAlchemy models
- `init_db.py`: Database initialization
- `schemas/`: Pydantic models for request/response validation
- `backfill_historical_featured_quotes_app/`: Utility for populating historical quotes
- `docker-compose-dev-db.yml`: Docker Compose configuration for local development database

## Historical Quote Backfill Tool

The project includes a specialized utility module (`backfill_historical_featured_quotes_app`) designed to populate the database with historical quotes from Wikiquote archives. This tool is particularly useful for:

- Initial database setup with quotes dating back to 2007
- Repopulating the database after a reset
- Adding missing historical quotes

### Features of the Backfill Tool

- Scrapes and processes Wikiquote's monthly quote archives
- Handles parsing of multiple HTML formats (Wikiquote changed its format in 2012)
- Uses asyncio for concurrent processing of multiple months/years
- Properly formats date information for database storage

### Running the Backfill Tool

To populate your database with historical quotes, run the following command from the project root after activating your virtual environment:

```bash
python -m backfill_historical_featured_quotes_app
```

This process may take several minutes depending on your internet connection, as it fetches and processes many pages of quotes.

For more details about the backfill tool, see the [dedicated README](/backfill_historical_featured_quotes_app/README.md) in the backfill module directory.

## Data Model

### Quote

- `id`: String (32 chars) - MD5 hash of quote and author
- `quote`: String (1000 chars) - The quote text
- `author`: String (255 chars) - The quote's author
- `id`: String (32 chars) - MD5 hash of quote, author and featured_date
- `quote`: String - The quote text
- `author`: String - The quote's author
- `featured_date`: Date - The date the quote was featured (YYYY-MM-DD)
- `created_at`: Date - The date the quote was created (YYYY-MM-DD HH:MM:SS)
- `updated_at`: Date - The date the quote was updated (YYYY-MM-DD HH:MM:SS)

## Development

Expand Down Expand Up @@ -172,3 +225,5 @@ This project is licensed under the MIT License - see the LICENSE file for detail
- [SQLAlchemy](https://www.sqlalchemy.org/) for the ORM
- [Pydantic](https://pydantic-docs.helpmanual.io/) for data validation
- [MariaDB](https://mariadb.org/) for the database system
- [Docker](https://www.docker.com/) for containerization support
- [BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup/) for web scraping functionality
23 changes: 16 additions & 7 deletions app/api/routers/quotes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
from sqlalchemy.orm import Session
from typing import List, Optional
from app.database.models import SessionLocal
from app.schemas.schemas import QuoteSchema
from app.schemas.schemas import Quote
from app.database.crud import add_quote_to_db, get_quote_by_date, get_all_quotes
from app.core.utils import fetch_quote_of_the_day
from app.core.utils import fetch_quote_of_the_day_from_api
from datetime import datetime

# Router instance
Expand All @@ -21,21 +21,30 @@ def get_db():
finally:
db.close()

@router.get("/quote_of_the_day", response_model=QuoteSchema)
async def get_quote_of_the_day(db: Session = Depends(get_db)):
quote_data = fetch_quote_of_the_day()
@router.get("/quote_of_the_day", response_model=Quote)
async def get_quote_of_the_day_route(db: Session = Depends(get_db)):
"""
`/api/quote_of_the_day` route is used to get the Quote of the Day.

First, we fetch the Quote of the Day from the WikiQuote API. Then, we check if that quote is present in the database.

If the quote is NOT present in the database, it is added to the database and then returned.

If the quote is present in the database, it is returned.
"""
quote_data = fetch_quote_of_the_day_from_api()
quote = get_quote_by_date(db, quote_data["featured_date"])
if not quote:
quote = add_quote_to_db(db, quote_data)
return quote

@router.get("/quotes", response_model=List[QuoteSchema])
@router.get("/quotes", response_model=List[Quote])
async def get_all_quotes_route(
page: int = 1, limit: int = 10, author: Optional[str] = None, db: Session = Depends(get_db)
):
return get_all_quotes(db, page, limit, author)

@router.get("/quotes/{date}", response_model=QuoteSchema)
@router.get("/quotes/{date}", response_model=Quote)
async def get_quote_by_date_route(date: str, db: Session = Depends(get_db)):
try:
target_date = datetime.strptime(date, "%Y-%m-%d")
Expand Down
10 changes: 5 additions & 5 deletions app/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@

class Settings(BaseSettings):
# Database settings
DB_HOST: str = os.getenv("DB_HOST", "localhost")
DB_USER: str = os.getenv("DB_USER", "root")
DB_PASSWORD: str = os.getenv("DB_PASSWORD", "")
DB_NAME: str = os.getenv("DB_NAME", "qotd")
DB_PORT: int = int(os.getenv("DB_PORT", "3306"))
DB_HOST: str = os.getenv("DB_HOST")
DB_USER: str = os.getenv("DB_USER")
DB_PASSWORD: str = os.getenv("DB_PASSWORD")
DB_NAME: str = os.getenv("DB_NAME")
DB_PORT: int = int(os.getenv("DB_PORT"))

# for mysql/mariadb
@property
Expand Down
9 changes: 6 additions & 3 deletions app/core/utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import hashlib
import requests
from bs4 import BeautifulSoup
from datetime import datetime
from datetime import datetime, timezone
import re

QOTD_API = 'https://en.wikiquote.org/w/api.php'
Expand All @@ -13,7 +13,7 @@
"formatversion": "2"
}

def fetch_quote_of_the_day() -> dict:
def fetch_quote_of_the_day_from_api() -> dict:
response = requests.get(QOTD_API, params=PARAMS)
response.raise_for_status()
data = response.json()
Expand Down Expand Up @@ -44,11 +44,14 @@ def extract_quote(html_content: str) -> dict:
if quote and author and featured_date:
break

unique_id = hashlib.md5(f"{quote}_{author}".encode()).hexdigest()
unique_id = hashlib.md5(f"{quote}_{author}_{featured_date}".encode()).hexdigest()
current_time = datetime.now(timezone.utc).isoformat()

return {
"id": unique_id,
"quote": quote,
"author": author,
"featured_date": featured_date,
"created_at": current_time,
"updated_at": current_time
}
67 changes: 62 additions & 5 deletions app/database/crud.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,25 @@
from datetime import datetime, date
from datetime import datetime, date, timezone
from sqlalchemy.orm import Session
from app.database.models import Quote
from typing import Optional, List
from typing import Optional, List, Any
from sqlalchemy import func
# import uuid
import hashlib
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def add_quote_to_db(db: Session, quote_data: dict) -> Quote:
featured_date = datetime.fromisoformat(quote_data['featured_date']).date()
current_time = datetime.now(timezone.utc)

quote = Quote(
id=quote_data['id'],
quote=quote_data['quote'],
author=quote_data['author'],
featured_date=featured_date
featured_date=featured_date,
created_at=current_time,
updated_at=current_time
)
db.add(quote)
db.commit()
Expand All @@ -26,9 +34,58 @@ def get_quote_by_date(db: Session, target_date: str):
target_date_obj = datetime.fromisoformat(target_date).date()
return db.query(Quote).filter(Quote.featured_date == target_date_obj).first()


def get_all_quotes(db: Session, page: int, limit: int, author: Optional[str] = None) -> List[Quote]:
query = db.query(Quote)
if author:
query = query.filter(Quote.author.ilike(f"%{author}%"))
return query.offset((page - 1) * limit).limit(limit).all()

def create_multiple_quotes(db: Session, quotes: List[Any]) -> List[Quote]:
"""
Create multiple quotes in the database from a list of Pydantic QuoteCreate objects.

Args:
db: SQLAlchemy database session
quotes: List of QuoteCreate objects

Returns:
List of created Quote objects
"""
try:
# Create SQLAlchemy model instances from Pydantic models
current_time = datetime.now(timezone.utc)
db_quotes = []

for quote in quotes:
# Convert Pydantic model to dict
try:
quote_dict = quote.model_dump()
except AttributeError:
# If it's already a dict
quote_dict = quote

# Generate an ID if not provided
if 'id' not in quote_dict:
# Create a hash based on quote text and date to ensure uniqueness
unique_id = hashlib.md5(f"{quote_dict['quote']}_{quote_dict['author']}_{quote_dict['featured_date']}".encode()).hexdigest()
quote_dict['id'] = unique_id

# Set timestamps if not provided
if 'created_at' not in quote_dict:
quote_dict['created_at'] = current_time
if 'updated_at' not in quote_dict:
quote_dict['updated_at'] = current_time

db_quote = Quote(**quote_dict)
db_quotes.append(db_quote)

# Add all quotes to the session
db.add_all(db_quotes)
# Commit the transaction to persist the objects
db.commit()

return db_quotes
except Exception as e:
db.rollback() # Roll back on error
logger.error(f"Error creating quotes: {str(e)}")
raise
16 changes: 10 additions & 6 deletions app/database/init_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
from app.core.config import settings
import mysql.connector
import time
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def wait_for_db():
max_retries = 10
Expand All @@ -18,15 +22,15 @@ def wait_for_db():
database=settings.DB_NAME
)
conn.close()
print("Successfully connected to the database!")
logger.info("Successfully connected to the database!")
return True
except mysql.connector.Error as e:
print(f"Database connection attempt {attempt + 1} failed: {str(e)}")
logger.error(f"Database connection attempt {attempt + 1} failed: {str(e)}")
if attempt < max_retries - 1:
print(f"Retrying in {retry_interval} seconds...")
logger.info(f"Retrying in {retry_interval} seconds...")
time.sleep(retry_interval)
else:
print("Maximum retries reached.")
logger.error("Maximum retries reached.")
raise

raise Exception("Could not connect to database after maximum retries")
Expand All @@ -36,9 +40,9 @@ def init_database():
wait_for_db()

# Create all tables
print("Creating database tables...")
logger.info("Creating database tables...")
Base.metadata.create_all(bind=engine)
print("Database tables created successfully!")
logger.info("Database tables created successfully!")

if __name__ == "__main__":
init_database()
Loading