Skip to content

backyMacky/Digital-Library-with-OpenAI-and-Google-Books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Book Metadata Enrichment Tool with OpenAI and Google Books

License Python Version Stars

Clone Repository Download ZIP Installation Guide Usage Guide

A powerful Python tool for enriching book metadata using the Google Books API and web scraping fallbacks. Perfect for librarians, book collectors, and data analysts working with book datasets.

Buy Me A Coffee

Features

  • 📚 Comprehensive Metadata: Fetches extensive book metadata from multiple sources.
  • 🔄 Multiple Formats: Supports both CSV and XLSX file formats.
  • 🔍 Google Books API Integration: Primary source for metadata.
  • 🌐 Web Scraping Fallbacks: Ensures data availability by scraping alternative sources when needed.
  • 👤 Interactive Selection: Facilitates user selection for ambiguous matches.
  • Asynchronous Processing: Enhances performance with async operations.
  • 🛡️ Ethical Web Scraping: Implements rate limiting to respect source websites.
  • 📊 Structured Output: Ensures consistent formatting in output files.

Installation

Clone the Repository

git clone https://github.com/backyMacky/AI-digital-library/
cd book-enrichment-tool

Install Dependencies

pip install -r requirements.txt

Requirements

Ensure you have the following packages installed:

  • pandas>=1.5.0
  • aiohttp>=3.8.0
  • beautifulsoup4>=4.9.0
  • inquirer>=3.1.0
  • openpyxl>=3.0.0 # For Excel support

Quick Start

  1. Get a Google Books API Key: Obtain it from the Google Cloud Console.

  2. Prepare Your Input File: Create a CSV or XLSX file with the following columns:

    • book_name: Title of the book
    • isbn: ISBN-10 or ISBN-13 number

    Example (books.csv or books.xlsx):

    name author isbn
    The Great Gatsby Lex Luthor 9780743273565
    1984 George Orwell 9780743273565
  3. Run the Script:

    python book_enricher.py

Input/Output Format

Input File Example

books.csv or books.xlsx:

book_name,isbn
"The Great Gatsby","9780743273565"
"1984","9780451524935"

Output Fields

  • book_name: Title from source
  • isbn: ISBN number
  • authors: Comma-separated list of authors
  • publisher: Publisher name
  • year: Publication year
  • pages: Page count
  • rating: Average rating (0-5)
  • url: Source URL
  • source: Data source (google, goodreads, worldcat)

Usage Examples

Basic Usage

from book_enricher import BookEnricher
import asyncio
from pathlib import Path

async def main():
    async with BookEnricher("YOUR_API_KEY") as enricher:
        await enricher.process_books(
            Path("books.xlsx"),
            Path("enriched_books.xlsx")
        )

asyncio.run(main())

Custom Source Integration

from book_enricher import BookSource, BookData
from typing import List

class CustomSource(BookSource):
    async def search(self, book_name: str, isbn: str) -> List[BookData]:
        # Your implementation here
        pass

# Add to enricher
enricher.sources.append(CustomSource(enricher.session))

Advanced Features

Rate Limiting

The tool implements respectful rate limiting:

  • 1 second delay between API requests
  • Custom user agent headers
  • Error handling with exponential backoff

Data Sources

  • Google Books API (primary)
  • Goodreads (fallback)
  • WorldCat (fallback)

Interactive Selection

When multiple matches are found, the tool presents an interactive selection:

Choose the correct book for 'The Great Gatsby':
❯ The Great Gatsby by F. Scott Fitzgerald (1925) - from google
  The Great Gatsby by F. Scott Fitzgerald (2004) - from goodreads
  Skip this book

Contributing

Contributions are welcome! Follow these steps:

  1. Fork the Repository

  2. Create a Feature Branch:

    git checkout -b feature/amazing-feature
  3. Commit Your Changes:

    git commit -m 'Add amazing feature'
  4. Push to the Branch:

    git push origin feature/amazing-feature
  5. Open a Pull Request

License

This project is licensed under the MIT License.

Acknowledgments

  • Google Books API for primary data source
  • Beautiful Soup for web scraping capabilities
  • Pandas for data handling
  • Inquirer for interactive CLI

Documentation

For detailed documentation, see:

Support

For support:

  • Check existing Issues
  • Open a new issue
    • Include sample data and full error traceback when reporting bugs

Built with ❤️ by Martin Bacigal

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages