|
4 | 4 | [](https://codecov.io/gh/ayonious/File-Compression) |
5 | 5 | [](https://github.com/ayonious/File-Compression/stargazers) |
6 | 6 |
|
7 | | -A File Compression software that helps zip/Unzip files using these 2 algorihtms: |
| 7 | +A Java-based file compression application implementing two classic compression algorithms: |
8 | 8 |
|
9 | | -1. Huffmans Code |
10 | | -2. Lempel-Ziv-Wells algorithm |
| 9 | +1. **Huffman Coding** - Frequency-based compression |
| 10 | +2. **LZW (Lempel-Ziv-Welch)** - Dictionary-based compression |
11 | 11 |
|
12 | | -# About Huffmans Code |
| 12 | +## Compression Algorithms |
13 | 13 |
|
14 | | -The Huffmans algo creates a 1-1 mapping for each byte of the input file |
15 | | -and replaces each byte with the mapped bit sequence. For this you need |
16 | | -to store a dictionary that describes each 1-1 mapping of input byte and |
17 | | -binary sequence.(which needs extraspace) |
| 14 | +### Huffman Coding |
18 | 15 |
|
19 | | -# About Lempel-Ziv-Wells |
| 16 | +**How it works:** |
| 17 | +Huffman coding is a **lossless compression algorithm** that assigns variable-length binary codes to characters based on their frequency. Characters that appear more frequently get shorter codes, while rare characters get longer codes. |
20 | 18 |
|
21 | | -Unlike Huffmans code LZW dont need an extra dictionary to be saved. Also |
22 | | -LZW does not create a mapping to byte to bin sequence. It creates mapping |
23 | | -of multiple byte to binary sequence. |
| 19 | +**Example:** |
| 20 | +``` |
| 21 | +Input text: "aabbc" |
| 22 | +Frequency: a=2, b=2, c=1 |
| 23 | +
|
| 24 | +Huffman codes assigned: |
| 25 | +a → 0 |
| 26 | +b → 10 |
| 27 | +c → 11 |
| 28 | +
|
| 29 | +Compressed: "0 0 10 10 11" = "00101011" (8 bits) |
| 30 | +Original: 5 characters × 8 bits = 40 bits |
| 31 | +Compression ratio: 80% reduction |
| 32 | +``` |
| 33 | + |
| 34 | +**Key characteristics:** |
| 35 | +- Creates a frequency table and binary tree during compression |
| 36 | +- Requires storing the frequency table in the compressed file (overhead) |
| 37 | +- Works best with files that have **uneven character distribution** |
| 38 | +- Typical use cases: Text files, source code, log files |
| 39 | + |
| 40 | +### LZW (Lempel-Ziv-Welch) |
| 41 | + |
| 42 | +**How it works:** |
| 43 | +LZW is a **dictionary-based compression algorithm** that builds a dictionary of sequences on-the-fly. Instead of replacing individual bytes, it replaces repeated sequences of bytes with dictionary codes. |
| 44 | + |
| 45 | +**Example:** |
| 46 | +``` |
| 47 | +Input text: "ABABABA" |
| 48 | +
|
| 49 | +Initial dictionary (ASCII): |
| 50 | +256: A |
| 51 | +257: B |
| 52 | +... |
| 53 | +
|
| 54 | +Compression process: |
| 55 | +- Read 'A' → output 256, add "AB" to dictionary (512) |
| 56 | +- Read 'B' → output 257, add "BA" to dictionary (513) |
| 57 | +- Read "AB" (found in dict!) → output 512, add "ABA" to dictionary (514) |
| 58 | +- Read "ABA" (found in dict!) → output 514 |
| 59 | +
|
| 60 | +Compressed: [256, 257, 512, 514] |
| 61 | +Original: 7 characters × 8 bits = 56 bits |
| 62 | +Compressed: 4 codes × ~9 bits = 36 bits |
| 63 | +Compression ratio: 35.7% reduction |
| 64 | +``` |
| 65 | + |
| 66 | +**Key characteristics:** |
| 67 | +- **No dictionary is stored** - both compressor and decompressor build the same dictionary |
| 68 | +- Replaces **sequences of bytes**, not individual bytes |
| 69 | +- Works best with files that have **repetitive patterns** |
| 70 | +- Typical use cases: Log files, structured data (JSON, XML), source code with repeated patterns |
| 71 | + |
| 72 | +### Comparison |
| 73 | + |
| 74 | +| Feature | Huffman Coding | LZW | |
| 75 | +|---------|---------------|-----| |
| 76 | +| **Method** | Frequency-based | Dictionary-based | |
| 77 | +| **Dictionary stored?** | Yes (frequency table) | No (built on-the-fly) | |
| 78 | +| **Best for** | Uneven character distribution | Repetitive patterns | |
| 79 | +| **Space overhead** | Medium (frequency table) | Low (no dictionary) | |
| 80 | +| **Example use case** | Text with skewed character frequency | Log files, structured data | |
24 | 81 |
|
25 | 82 | ## Installation |
26 | 83 |
|
@@ -67,7 +124,7 @@ mvn exec:java |
67 | 124 | ### Using JAR directly |
68 | 125 | After building with Maven, you can run the JAR: |
69 | 126 | ```bash |
70 | | -java -jar target/file-compression-1.0-SNAPSHOT-jar-with-dependencies.jar |
| 127 | +java -jar target/file-compression-2.0-SNAPSHOT-jar-with-dependencies.jar |
71 | 128 | ``` |
72 | 129 |
|
73 | 130 |  |
|
0 commit comments