An elegant text analysis project that explores the fascinating relationships between 64 literary works. Using sophisticated comparison algorithms, this project uncovers hidden patterns and similarities in classic literature.
- Identifies the 100 most influential words across the literary corpus
- Generates a comprehensive 64x64 similarity matrix
- Reveals the top 10 most closely related book pairs
- Processes and analyzes raw text with precision
- Implements robust comparison algorithms
Our analysis revealed some fascinating connections between books. Here are some notable findings:
- Gerard's Herbal volumes show remarkable similarity (91% match between Vol. 3 and Vol. 4)
- Memoirs of Laetitia Pilkington demonstrates strong narrative consistency across volumes
- Foxes Book of Martyrs maintains thematic coherence throughout different parts
- Custom Book class for efficient text representation
- Advanced text preprocessing pipeline
- Sophisticated similarity calculation algorithms
- Comprehensive word frequency analysis
- Language: C++
- Data Format: Raw text files
- Analysis Method: Statistical text comparison
The program efficiently processes 64 books and performs:
- Word frequency analysis across the entire corpus
- 2,016 unique book-to-book comparisons
- Similarity ranking and sorting
- Clone the repository
- Ensure all book text files are in the designated directory
- Compile the C++ source files
- Run the executable
- Find results in the output directory
The program generates several output files:
- common_words.txt: Top 100 most frequent words
- similarity_matrix.txt: Complete comparison matrix
- similar_books.txt: Top 10 most similar book pairs
- Natural Language Processing integration
- Interactive visualization dashboard
- Semantic analysis capabilities
- Genre classification features
- Multi-language support
The current implementation uses a single-threaded approach to ensure maximum stability and reliability. While this design choice impacts processing speed, it guarantees consistent results across all system configurations.
Special thanks to:
- The professors for their guidance
- The open-source community for inspiration
- Project collaborators for their valuable input
Created with β€οΈ by a passionate programmer
RAUNAK KUMAR JHA