A powerful Python tool for enriching book metadata using the Google Books API and web scraping fallbacks. Perfect for librarians, book collectors, and data analysts working with book datasets.
- 📚 Comprehensive Metadata: Fetches extensive book metadata from multiple sources.
- 🔄 Multiple Formats: Supports both CSV and XLSX file formats.
- 🔍 Google Books API Integration: Primary source for metadata.
- 🌐 Web Scraping Fallbacks: Ensures data availability by scraping alternative sources when needed.
- 👤 Interactive Selection: Facilitates user selection for ambiguous matches.
- ⚡ Asynchronous Processing: Enhances performance with async operations.
- 🛡️ Ethical Web Scraping: Implements rate limiting to respect source websites.
- 📊 Structured Output: Ensures consistent formatting in output files.
git clone https://github.com/backyMacky/AI-digital-library/
cd book-enrichment-toolpip install -r requirements.txtEnsure you have the following packages installed:
pandas>=1.5.0aiohttp>=3.8.0beautifulsoup4>=4.9.0inquirer>=3.1.0openpyxl>=3.0.0# For Excel support
-
Get a Google Books API Key: Obtain it from the Google Cloud Console.
-
Prepare Your Input File: Create a CSV or XLSX file with the following columns:
book_name: Title of the bookisbn: ISBN-10 or ISBN-13 number
Example (
books.csvorbooks.xlsx):name author isbn The Great Gatsby Lex Luthor 9780743273565 1984 George Orwell 9780743273565 -
Run the Script:
python book_enricher.py
books.csv or books.xlsx:
book_name,isbn
"The Great Gatsby","9780743273565"
"1984","9780451524935"
book_name: Title from sourceisbn: ISBN numberauthors: Comma-separated list of authorspublisher: Publisher nameyear: Publication yearpages: Page countrating: Average rating (0-5)url: Source URLsource: Data source (google,goodreads,worldcat)
from book_enricher import BookEnricher
import asyncio
from pathlib import Path
async def main():
async with BookEnricher("YOUR_API_KEY") as enricher:
await enricher.process_books(
Path("books.xlsx"),
Path("enriched_books.xlsx")
)
asyncio.run(main())from book_enricher import BookSource, BookData
from typing import List
class CustomSource(BookSource):
async def search(self, book_name: str, isbn: str) -> List[BookData]:
# Your implementation here
pass
# Add to enricher
enricher.sources.append(CustomSource(enricher.session))The tool implements respectful rate limiting:
- 1 second delay between API requests
- Custom user agent headers
- Error handling with exponential backoff
- Google Books API (primary)
- Goodreads (fallback)
- WorldCat (fallback)
When multiple matches are found, the tool presents an interactive selection:
Choose the correct book for 'The Great Gatsby':
❯ The Great Gatsby by F. Scott Fitzgerald (1925) - from google
The Great Gatsby by F. Scott Fitzgerald (2004) - from goodreads
Skip this book
Contributions are welcome! Follow these steps:
-
Fork the Repository
-
Create a Feature Branch:
git checkout -b feature/amazing-feature
-
Commit Your Changes:
git commit -m 'Add amazing feature' -
Push to the Branch:
git push origin feature/amazing-feature
-
Open a Pull Request
This project is licensed under the MIT License.
- Google Books API for primary data source
- Beautiful Soup for web scraping capabilities
- Pandas for data handling
- Inquirer for interactive CLI
For detailed documentation, see:
For support:
- Check existing Issues
- Open a new issue
- Include sample data and full error traceback when reporting bugs
Built with ❤️ by Martin Bacigal
