Skip to content

raunak0400/similarity-checking-from-books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Book Comparison Project

🎯 Overview

An elegant text analysis project that explores the fascinating relationships between 64 literary works. Using sophisticated comparison algorithms, this project uncovers hidden patterns and similarities in classic literature.

✨ Key Features

  • Identifies the 100 most influential words across the literary corpus
  • Generates a comprehensive 64x64 similarity matrix
  • Reveals the top 10 most closely related book pairs
  • Processes and analyzes raw text with precision
  • Implements robust comparison algorithms

πŸ“Š Results Highlight

Our analysis revealed some fascinating connections between books. Here are some notable findings:

  1. Gerard's Herbal volumes show remarkable similarity (91% match between Vol. 3 and Vol. 4)
  2. Memoirs of Laetitia Pilkington demonstrates strong narrative consistency across volumes
  3. Foxes Book of Martyrs maintains thematic coherence throughout different parts

πŸ› οΈ Technical Implementation

Core Components

  • Custom Book class for efficient text representation
  • Advanced text preprocessing pipeline
  • Sophisticated similarity calculation algorithms
  • Comprehensive word frequency analysis

Technology Stack

  • Language: C++
  • Data Format: Raw text files
  • Analysis Method: Statistical text comparison

πŸ“ˆ Performance

The program efficiently processes 64 books and performs:

  • Word frequency analysis across the entire corpus
  • 2,016 unique book-to-book comparisons
  • Similarity ranking and sorting

πŸš€ Getting Started

  1. Clone the repository
  2. Ensure all book text files are in the designated directory
  3. Compile the C++ source files
  4. Run the executable
  5. Find results in the output directory

πŸ“ Output Files

The program generates several output files:

  • common_words.txt: Top 100 most frequent words
  • similarity_matrix.txt: Complete comparison matrix
  • similar_books.txt: Top 10 most similar book pairs

πŸ” Future Scope

  • Natural Language Processing integration
  • Interactive visualization dashboard
  • Semantic analysis capabilities
  • Genre classification features
  • Multi-language support

πŸ’‘ Technical Note

The current implementation uses a single-threaded approach to ensure maximum stability and reliability. While this design choice impacts processing speed, it guarantees consistent results across all system configurations.

🀝 Acknowledgments

Special thanks to:

  • The professors for their guidance
  • The open-source community for inspiration
  • Project collaborators for their valuable input

Created with ❀️ by a passionate programmer
RAUNAK KUMAR JHA

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages